In previous posts I have looked at how to extract images from a web page with PHP and the Simple HTML DOM Parser and generate thumbnails with PHP using a class I created. This page combines the two by downloading all the images from a specified web page and creating thumbnails for them.
The code
Read the two posts linked two above for full details about how the HTML DOM Parser works and how the class I have created to generate thumbnails works. Then have a look at the few lines of code below that combine the two to generate the thumbnails.
Note that the code downloads the images directly using file_get_contents and therefore needs URL aware fopen wrappers enabled to work. Read my last post which shows how to check if these are enabled.
require_once('/path/to/simple_html_dom.php'); $html = file_get_html('http://www.cnn.com/'); $images = array(); foreach($html->find('img') as $element) { $images[$element->src] = true; } $tg = new thumbnailGenerator; foreach($images as $url => $void) { $tg->generate($url, 100, 100, '/path/to/thumbnails/' . md5($url) . '.jpg'); }
Note: $tg in the example above is an instance of my thumbnailGenerator class.
The example downloads the www.cnn.com homepage and extracts all the images using the HTML DOM Parse, whose syntax works the same way as jQuery.
The images that are found are put into an array indexed by the full url; this effectively eliminates duplicates (which in the case of the CNN homepage at the current time includes a 1 pixel spacer image used many times).
This array is then looped through and the thumbnail images generated. I’ve named them using an md5 hash based on the full url with a .jpg extension/format in the example. This solves issues with pathing etc in the full URL filename.
The above example will create thumbnails that are a maximum 100×100 pixels.
CSS images won’t be included
Note that the above example only gets images from the page which are defined with an <img> tag; any defined inline using CSS backgrounds etc or in a style sheet will not be downloaded.
Refinements to the script
The script could be refined to exclude images that are below a certain size (e.g images which are less than 100 pixels wide or 100 pixels high could be ignored) or of a particular format. You could do the latter with my thumbnail generation class by setting the allowable types. I’ll have a look at these (and any suggestions made in the comments below) and post an update in a few days.