• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Electric Toolbox Blog

The Electric Toolbox Blog

Linux, Apache, Nginx, MySQL, Javascript and PHP articles

  • Applications
  • FCKEditor
  • Apache
  • Windows
  • Contact Us
Home / Resolving relative URLs to absolute in PHP

Resolving relative URLs to absolute in PHP

When scraping content using the PHP Simple HTML DOM Parser it is useful to resolve relative URLs in a page to absolute URLs for downloading additional web pages or images. I do this using the url_to_absolute library by Nadeau Software Consulting and show how to do this here, along with a minor fix which needs to be done to their code.

Download the library

Click through here to the blog post on the Nadeau Software Consulting titled "How to convert a relative URL to an absolute URL" to download the url_to_absolute library; it’s in the downloads section on that page as a zip file.

The zip file contains url_to_absolute.php, split_url.php and join_url.php. The first of these contains the url_to_absolute() function and it require()s the other two files which contain helper functions. These need to be somewhere in your include path, or you can modify the file so that

require "split_url.php";
require "join_url.php";

reads like this:

$dir = dirname(__FILE__);
require $dir . "/split_url.php";
require $dir . "/join_url.php";

Resolving URLs

To work out the absolute URL of aboutus.html relative to the page http://www.example.com/sitemap.html do this:

url_to_absolute('http://www.example.com/sitemap.html', 'aboutus.html');

which would return:

http://www.example.com/aboutus.html

To work out the absolute URL of ../images/somephoto.jpg relative to http://www.example.com/content/sitemap.html do this:

url_to_absolute('http://www.example.com/content/sitemap.html', '../images/somephoto.jpg');

which would return:

http://www.example.com/images/somephoto.jpg

URLs are encoded/decoded by default

The join_url  and split_url helper functions automatically encode and decode URL parts by default using rawurlencode and rawurldecode which isn’t ideal if the resulting URL is being used to download another web page or image file etc.

For example, if we wanted to convert /somepage.php?foo=bar&baz=bat like so:

url_to_absolute('http://www.example.com/content/sitemap.html', '/somepage.php?foo=bar&baz=bat')

The value returned would be:

http://www.example.com/somepage.php?foo%3Dbar%26baz%3Dbat

This is obviously not what we want to see.

To solve this problem, a minor modification is needed to the url_to_absolute function or to the split_url or join_url functions.

If changing url_to_absolute, locate all instances of split_url and join_url and add a second parameter "false" to the function calls. For example:

$r = split_url( $relativeUrl );

would become

$r = split_url( $relativeUrl, false );

Alternatively modify the join_url and split_url functions from:

function split_url( $url, $decode=TRUE ) ...
function join_url( $parts, $encode=TRUE ) ...

to:

function split_url( $url, $decode=FALSE ) ...
function join_url( $parts, $encode=FALSE ) ...

Once one of these changes have been made, the URLs will no longer be encoded and will be converted to absolute URLs correctly.

Check Out These Related posts:

  1. Extract images from a web page with PHP and the Simple HTML DOM Parser
  2. Vim Show Line Numbers
  3. Find all anchor tags in a page with PHP and the Simple HTML DOM Parser
  4. Replacing relative URLs with absolute URLs in PHP

Filed Under: PHP

Primary Sidebar

Categories

  • Apache
  • Applications
  • Article
  • Case Studies
  • Email Servers
  • FCKEditor
  • HTML And CSS
  • Javascript
  • Linux/Unix/BSD
  • Microsoft SQL Server
  • Miscellaneous Postings
  • MySql
  • Networking
  • Nginx Web Server
  • Offsite Articles
  • OSX
  • PHP
  • Quick Tips
  • RFC – Request for Comments
  • SilverStripe
  • VMWare
  • VPN
  • Windows
  • WordPress

Recent Posts

  • Vim Show Line Numbers
  • Add User To Group Linux
  • Chmod 777 Tutorial
  • How to Copy Directory Linux
  • Linux create user

Copyright © 2021. ElectricToolBox. All Rights Reserved.

  • Contact Us
  • Copyright Info
  • Privacy Policy
  • Sitemap