I recently posted about how to move *nix application with minimal downtime, inspired by having to do this with a Postgres install I support. The last post looked at this in fairly general terms, and this post more specifically at the exact problem I experienced.
Running out of disk space
The partition the Postgres database files are stored on this particular sever was running fairly low on space and we needed to free up space by purging some old data. We ran the necessary DELETE queries and then attempted to do a VACUUM FULL to free up the actual physical space.
The problem was there wasn't enough free disk space to run the VACUUM FULL so it failed and we were left with the same small amount of disk space afterwards. It was also a regular partition and not a logical volume manager disk so we couldn't simply add more disk to the volume.
Solution – temporarily move pg_xlog
The pg_xlog describes every change made to the database's data files and was growing as the VACUUM FULL process ran. The simplest solution to me appeared to be to shut down Postgres, copy the pg_xlog onto another partition and symlink to the new temporary copy. Then Postgres could be started up again and we could run VACUUM FULL again.
The process
The Postgres init script on this particular server is at /etc/init.d/msyspg and the data files at /var/db/msyspg. I copied the xlog to /var/log/postgres/pg_xlog because that volume had loads of free space. Adjust the paths in the commands below to your install:
mkdir /var/log/postgres cp -pr /var/db/msyspg/pg_xlog /var/log/postgres /etc/init.d/msyspg stop rsync -a --delete /var/db/msyspg/pg_xlog /var/log/postgres mv /var/db/msyspg/pg_xlog /var/db/msyspg/pg_xlog~ ln -s /var/log/postgres/pg_xlog /var/db/msyspg /etc/init.d/msyspg start
So what's going on here?
Line 1 creates the parent directory of where I was going to copy the pg_xlog to
Line 2 copies the pg_xlog while Postgres is still running; this may take a while
Line 3 stops Postgres
Line 4 does an rsync to copy anything that's changed inbetween making the original copy and stopping Postgres. By doing a copy first and an rsync afterwards you can keep the downtime to a minimum. If this is on an offline server or time is not an issue, then you could simply omit line 2 and do the copy here instead.
Line 5 renames the original pg_xlog directory to pg_xlog~; doing this rather than simply deleting it means that a) if anything goes wrong we can simply rename it back to the old name and b) when copying the data back again we can use that same directory.
Line 6 creates the symbolic link to the new location of the pg_xlog
Line 7 starts Postgres up again. At this point everything should be working and Postgres using the pg_xlog at the symlinked location.
Moving it back again
Once the VACUUM FULL process had run the pg_xlog was much bigger but a lot of space had been freed up. We left it for a couple of days before moving it back again when the pg_xlog was smaller by reversing the process as follows:
rsync -a /var/log/postgres/pg_xlog/ /var/db/msyspg/pg_xlog~/ /etc/init.d/msyspg stop rsync -a --delete /var/log/postgres/pg_xlog/ /var/db/msyspg/pg_xlog~/ rm /var/db/msyspg/pg_xlog mv /var/db/msyspg/pg_xlog~ /var/db/msyspg/pg_xlog /etc/init.d/msyspg start
To summarize the above, we rsync the data from the temporary location back to the old copy in the renamed directory while Postgres is running and then again after stopping Postgres. Again this cuts down on the amount of downtime.
Then the symlink is removed, the original pg_xlog renamed back to the correct name and Postgres started up again. Now Postgres should be working and using the pg_xlog from the original location.