Oct. 13, 2012, 1:19 p.m.
IT

PostgreSQL vacuum full takes extremely long

If you have a large database, with some large tables and perform a mass update or delete statement, chances are that you will be out of a lot of disk space due to the MVCC architecture of the database engine. Updates are basically inserts and deletes, which bloats the database. If a database is say 100GB, and a table is 20GB, if you perform a mass update on all rows of that table, your database will grow in size to about 120GB. The old rows are left and will be re-used as time goes on, however therein lies the problem. Sometimes you do not have the luxury of lots of free disk space, and need to reclaim it immediately.

A first attempt at reclaiming the lost disk space is to perform a

vacuum full freeze verbose analyse <tablename>

That will work, however for me it took about 4 days of downtime (vacuum full takes out an exclusive table lock), CPU was 75%, disk IO was negligible and it just did not get anywhere. Considering that I had two tables I had to update, this was unacceptable. I only had 48 hours of downtime allowed.

There are two neat tricks to get around this. For me it took about an hour to complete. Either CLUSTER your table, or do this:

alter table <tablename> alter column <column> type <existing type>

You alter a column in the table to the same type it currently is. This seemingly redundant operation will copy the table (it will not physically reorder anything, but it will get rid of all dead tuples). Yes, while it runs it will consume up to 20GB of disk space as it duplicates the table, which means you will hit a peak of 140GB, and once the operation is complete, your database will be back to 100GB. I do not think vacuum full will consume this much disk space as it tries to re-organise the rows within the table space, however I am not 100% sure about that. However comparing 1 hour versus 4 days, it is a no brainer.

Update: For PostgreSQL 9.x, make sure to change the type to something else then change it back, just make also sure not to drop data due to choosing an incompatible or smaller domain. For instance, for a numeric(12,2) field you would alter to numeric(12,3) then numeric(12,2). Seems PostgreSQL 9.x is clever enough not to perform the redundant operation.