The PHP websites I manage email me all notices, warnings and errors and I occasionally get warnings resulting from bots requesting bad URLs. This post looks at a PHP parse_url error caused by an incorrect piece of code that was published which has since been corrected, but there are clearly still scripts out there using the old code and making bad requests.
The code used
The code used by the bots or scripts that are causing the error is from a posts titled "How to get Certificate Information Using WinInet APIs" over at the MSDN blogs.
The code originally had an error in it which resulted in a head call like this, where the actual https:// prefix and domain is prefixed with a leading /, resulting in an invalid request:
HEAD /https://www.example.com/
The request in the Apache log file will look like this:
"HEAD /https://www.example.com/ HTTP/1.1" 200 - "-" "Test Certificate Info"
The code has since been fixed and now results in valid requests.
parse_url results in a warning
The above request will result in $_SERVER[‘REQUEST_URI’] being ‘/https://www.example.com/’ When attempting to run PHP’s parse_url on it like so:
parse_url('/https://www.example.com');
a warning will be issued like so:
Warning: parse_url(/https://www.example.com): Unable to parse URL in [file] on line [line]
PHP versions affected
According to the manual page for the parse_url function, parse_url no longer issues a warning when the URL parsing failed from PHP 5.3.3. If you are using 5.3.3 or higher then this isn’t an issue for you.
What can be done?
As mentioned above, you’ll only see warnings in PHP prior to 5.3.3. There’s not much that can be done to the bots or scripts using the invalid code, and it just shows how we need to be defensive in coding when relying on outside input which could be invalid.
In a production environment you shouldn’t be displaying errors on a website anyway and this is just a bot doing a HEAD request so there’s not going to be any issues in the page rendering to it, even if it was doing a regular request. So you can pretty much ignore the error.
To avoid the warning being logged or emailed (if, like me, that’s what you do) then some possible solutions are:
- change the level at which you log or email errors
- temporarily change the reporting level before and restore it after the call to parse_url (I’ve shown how to do this in my "get and modify the error reporting level in PHP" post
- replace error reporting with exception handlers and put it in a try..catch block (again see a previous post titled "Replace error reporting with exception handlers with PHP")
- or simply prefix the call to parse_url with @ i.e. @parse_url to suppress the warning. This isn’t ideal but it’s a lot easier to code than 2 or 3 and means you don’t have to suppress other warnings being logged/emailed