Saving time and bandwidth by creating a DVD image from CD ISO files
A little while back I did something stupid and deleted a lot of files from a hard drive array, including a few hundred Linux and BSD CD and DVD ISO images. I had a large number of these archived to disc and have been able to re-download most of the other stuff I need easily enough, but there’s always the challenge to download the replacements as quickly as possible and using as little bandwidth as possible (bandwidth is still pretty expensive here in New Zealand).
In the case of a distribution like CentOS, each release comes on a number of CDs or alternatively a single DVD. The contents of each are fairly similar, and there are a number of mirrors around the world which support downloading using rsync, as well as the more standard ftp and http methods.
Update November 5th 2008: Thanks to this forum thread which mentions this post, I have found out that you can download the mkdvdiso.shscript at http://isoredirect.centos.org/centos/build/ to create a DVD from the CD images. I have not tried this myself but presumably the DVD ISO image that is generated matches the md5 checksum. I will try this out in the next few days and update again.
Update ends; original post resumes…
It suddenly struck me one day that I should be able to simply concatenate the contents of the CDs into one great big file, and then rsync it against an rsync server. It will contain a fair amount of start and end stuff not on the DVD, but assuming the contents are in a relatively similar order on the discs then the amount needed to download to make the DVD image correct should be less than having to download the whole thing.
I already had the CD ISO images but not the DVD one. If it didn’t work, the worst case would be downloading the entire thing. I expected to maybe save a few hundred megabytes, or something along those lines, and was impressed to discover that, at least with the x86_64 version of CentOS 5.0 that I only had to download 131MB to make the image valid.
This is how I did it…
First of all, here’s the list of CD ISO images:
$ ls -1 -rw-r--r-- 1 1000 users 655493120 Jan 18 12:27 CentOS-5.0-x86_64-bin-1of7.iso -rw-r--r-- 1 root root 665100288 Apr 11 2007 CentOS-5.0-x86_64-bin-2of7.iso -rw-r--r-- 1 root root 666744832 Apr 11 2007 CentOS-5.0-x86_64-bin-3of7.iso -rw-r--r-- 1 root root 617988096 Apr 11 2007 CentOS-5.0-x86_64-bin-4of7.iso -rw-r--r-- 1 root root 645744640 Apr 11 2007 CentOS-5.0-x86_64-bin-5of7.iso -rw-r--r-- 1 root root 664485888 Apr 11 2007 CentOS-5.0-x86_64-bin-6of7.iso -rw-r--r-- 1 root root 374009856 Apr 11 2007 CentOS-5.0-x86_64-bin-7of7.iso
Now I concatenate them together into a file named after the DVD version:
$ for filename in `ls -1 *.iso`; do cat $filename >> CentOS-5.0-x86_64-bin-DVD.iso; done;
Then rsync against a CentOS rsync mirror which also allows direct DVD downloads:
$ rsync -az --progress --stats rsync://ftp.jaist.ac.jp/pub/Linux/CentOS/5.0/isos-dvd/CentOS-5.0-x86_64-bin-DVD.iso .
And here’s the output from the above command:
receiving file list ... 1 file to consider CentOS-5.0-x86_64-bin-DVD.iso 4287268864 100% 3.40MB/s 0:20:01 (1, 100.0% of 1) Number of files: 1 Number of files transferred: 1 Total file size: 4287268864 bytes Total transferred file size: 4287268864 bytes Literal data: 137032352 bytes Matched data: 4150236512 bytes File list size: 94 Total bytes sent: 524156 Total bytes received: 137091839 sent 524156 bytes received 137091839 bytes 91348.15 bytes/sec total size is 4287268864 speedup is 31.15
So it only took 20 minutes to download and only required 131MB of actual data downloaded to correct the image to the valid DVD image. I’ve bolded the part above which shows the actual data transferred.
Now I checksummed it against the md5sum, just to double check, first by creating the md5 checksum file:
$ echo "246f5740f70abd020048d87becf8af24 CentOS-5.0-x86_64-bin-DVD.iso" > CentOS-5.0-x86_64-bin-DVD.iso.md5
and then running the checksum:
$ md5sum -c CentOS-5.0-x86_64-bin-DVD.iso.md5 CentOS-5.0-x86_64-bin-DVD.iso: OK
Excellent, it’s perfectly valid. The great thing about using this method is that when a new release comes out, I only need to get the CDs and can then make a DVD image from them, finally rsyncing the DVD against a rsync server to fix the image.