Extract Google Analytics report attachments with PHP

10 days ago I looked at how to send Google Analytics reports by email to start a series about how to process Google Analytics reports with PHP. I then followed this up with posts about how to use the PHP IMAP functions to download email and then extract attachments. Before moving on to show how to parse the data from the CSV, TSV or XML file attachments I’ll use this post to work out which part from the email contains the Google Analytics attachment using the same functions from the extract attachments post.

Using the code from the last post I end up with a $parts array that looks like this:

 [0] => Array
 [is_attachment] =>
 [filename] =>
 [name] =>
 [attachment] =>
 [1] => Array
 [is_attachment] => 1
 [filename] => Analytics_www.electrictoolbox.com_20090111-20090210_(Top_Content).csv
 [name] => Analytics_www.electrictoolbox.com_20090111-20090210_(Top_Content).csv
 [attachment] => ...

Note that the filename in the above example is slightly different from the last post because I’ve added “Top Content” to the email’s subject which means that will appear in the attachment’s filename. This can help to differentiate the different emails and different attachments. You can do this by entering something into the “Subject” field when setting up the email.

The following code loops through all the parts of the email and uses a regular expression to extract the domain, from date, to date and subject from the filename:

foreach($parts as $part) {
 if($part['is_attachment']) {
 preg_match('/Analytics_(.*)_([0-9]{8})-([0-9]{8})_((.*)).csv/', $part['filename'], $matches);
 if($matches) {
 $domain = $matches[1];
 $from = strtotime($matches[2]);
 $to = strtotime($matches[3]);
 $subject = $matches[4];
 if($subject == 'Top_Content') {
 ... do something ...

The foreach loop loops through each part of the email and then runs a regular expression against the filename as long as the part is an attachment.

The regular expression matches the domain –> Analytics_(.*)_ <– where the domain will be where the (.*) part is. This will go into element 1 of the $matches array.

If matches the from and to dates –> ([0-9]{8})-([0-9]{8}) <– which will go into elements 2 and 3 of the $matches array.

Finally, the subject part of the filename is matched. Because it’s got round brackets around it they have to be escaped with so the regexp looks a little messy at that point –> _((.*)).csv <– and it will go into element 4.

If the regular expression matches then it’s an attachment we want to process. The next section of code puts the matches into nicely named variables, converting the from and to dates into UNIX timestamps which can be formatted as you wish with the date() function.

The final test is if the subject matches ‘Top_Content’ (note that Google changes spaces in the subject to underscores in the filename) and then to do something if it is. Depending on what the subject is you could call different functions.

Future posts

As well as some other PHP posts about other useful things, this particular series will continue looking at dealing with Google Analytics emails with PHP. There will be at least 4 more posts in this series: looping through a mailbox to find emails from Google Analytics; using PHP’s IMAP functions to connect to Gmail; reading the data from a CSV or TSV attachment; and reading the data from an XML file.

The next PHP post (on Sunday) will look at copying a MySQL table and data using a PHP script to automate the process, and this series will resume again on Monday. Make sure you subscribe to my RSS feed (see details below) so you don’t miss out on this ongoing series.