Thread: VACUUM FULL pg_largeobject without (much) downtime?
Hi list, We run a website. We once stored all sorts of files in pg_largeobject, which grew to 266GB. This is on an m1.large on Amazon EC2 on a single, magnetic, non-provisioned-IO volume. In that context, 266GB is a lot. We've since moved all but 60GB of that data to S3. We plan to reduce that to 1GB by deleting old, unused data. Of course, pg_largeobject will still take up 266GB because autovacuum doesn't reduce disk space. We want to move our entire database to an SSD volume, with as little downtime as possible. My tentative plan: 1. Use CREATE TABLESPACE and pg_repack to move user tables to a temporary volume 2. Take down Postgres, copy system-table files to the new volume, and start up Postgres from the new volume 3. Use pg_repack to move everything to the new volume This plan won't work: Step 2 will be too slow because pg_largeobject still takes 266GB. We tested `VACUUM FULL pg_largeobject` on our staging database: it took two hours, during which pg_largeobject was locked. When pg_largeobject is locked, lots of our website doesn't work. How can we move our database without much downtime? Is there a way to `VACUUM FULL` pg_largeobject without locking it for very long? Aside from that problem, is the rest of my upgrade plan sane? For what it's worth, here's some info from VACUUM VERBOSE: overview=# VACUUM (VERBOSE, ANALYZE) pg_largeobject; INFO: vacuuming "pg_catalog.pg_largeobject" INFO: scanned index "pg_largeobject_loid_pn_index" to remove 1112630 row versions DETAIL: CPU 3.38s/9.89u sec elapsed 69.02 sec. INFO: "pg_largeobject": removed 1112630 row versions in 374889 pages DETAIL: CPU 7.48s/2.22u sec elapsed 150.44 sec. INFO: index "pg_largeobject_loid_pn_index" now contains 29373858 row versions in 370470 pages DETAIL: 1112630 index row versions were removed. 279489 index pages have been deleted, 276070 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.06 sec. INFO: "pg_largeobject": found 1112622 removable, 231974 nonremovable row versions in 3189820 out of 34522175 pages DETAIL: 0 dead row versions cannot be removed yet. There were 10318306 unused item pointers. 0 pages are entirely empty. CPU 69.81s/28.83u sec elapsed 1402.53 sec. INFO: analyzing "pg_catalog.pg_largeobject" INFO: "pg_largeobject": scanned 30000 of 34522175 pages, containing 25085 live rows and 0 dead rows; 25085 rows in sample, 24203398 estimated total rows VACUUM Enjoy life, Adam -- Adam Hooper +1-613-986-3339 http://adamhooper.com
On Tue, 3 Feb 2015 10:53:11 -0500 Adam Hooper <adam@adamhooper.com> wrote: > Hi list, > > We run a website. We once stored all sorts of files in pg_largeobject, > which grew to 266GB. This is on an m1.large on Amazon EC2 on a single, > magnetic, non-provisioned-IO volume. In that context, 266GB is a lot. > > We've since moved all but 60GB of that data to S3. We plan to reduce > that to 1GB by deleting old, unused data. Of course, pg_largeobject > will still take up 266GB because autovacuum doesn't reduce disk space. > > We want to move our entire database to an SSD volume, with as little > downtime as possible. My tentative plan: > > 1. Use CREATE TABLESPACE and pg_repack to move user tables to a temporary volume > 2. Take down Postgres, copy system-table files to the new volume, and > start up Postgres from the new volume > 3. Use pg_repack to move everything to the new volume > > This plan won't work: Step 2 will be too slow because pg_largeobject > still takes 266GB. We tested `VACUUM FULL pg_largeobject` on our > staging database: it took two hours, during which pg_largeobject was > locked. When pg_largeobject is locked, lots of our website doesn't > work. Sometimes CLUSTER is faster than VACUUM FULL ... have you tested CLUSTERing of pg_largeobject on your test system to see if it's fast enough? How big is the non-lo data? -- Bill Moran
On Tue, Feb 3, 2015 at 12:58 PM, Bill Moran <wmoran@potentialtech.com> wrote: > On Tue, 3 Feb 2015 10:53:11 -0500 > Adam Hooper <adam@adamhooper.com> wrote: > >> This plan won't work: Step 2 will be too slow because pg_largeobject >> still takes 266GB. We tested `VACUUM FULL pg_largeobject` on our >> staging database: it took two hours, during which pg_largeobject was >> locked. When pg_largeobject is locked, lots of our website doesn't >> work. > > Sometimes CLUSTER is faster than VACUUM FULL ... have you tested CLUSTERing > of pg_largeobject on your test system to see if it's fast enough? On the 30GB that's left on staging, it takes 50min. Unfortunately, our staging database is now at 30GB because we already completed a VACUUM FULL on it. It seems difficult to me to revert that operation. But I need an orders-of-magnitude difference, and this clearly isn't it. > How big is the non-lo data? It's 65GB, but I've used pg_repack to move it to a separate tablespace so it won't affect downtime. Enjoy life, Adam -- Adam Hooper +1-613-986-3339 http://adamhooper.com
On Tue, 3 Feb 2015 14:17:03 -0500 Adam Hooper <adam@adamhooper.com> wrote: > On Tue, Feb 3, 2015 at 12:58 PM, Bill Moran <wmoran@potentialtech.com> wrote: > > On Tue, 3 Feb 2015 10:53:11 -0500 > > Adam Hooper <adam@adamhooper.com> wrote: > > > >> This plan won't work: Step 2 will be too slow because pg_largeobject > >> still takes 266GB. We tested `VACUUM FULL pg_largeobject` on our > >> staging database: it took two hours, during which pg_largeobject was > >> locked. When pg_largeobject is locked, lots of our website doesn't > >> work. > > > > Sometimes CLUSTER is faster than VACUUM FULL ... have you tested CLUSTERing > > of pg_largeobject on your test system to see if it's fast enough? > > On the 30GB that's left on staging, it takes 50min. Unfortunately, our > staging database is now at 30GB because we already completed a VACUUM > FULL on it. It seems difficult to me to revert that operation. But I > need an orders-of-magnitude difference, and this clearly isn't it. It was worth a shot. > > How big is the non-lo data? > > It's 65GB, but I've used pg_repack to move it to a separate tablespace > so it won't affect downtime. My recommendation here would be to use Slony to replicate the data to a new server, then switch to the new server once the data has synchornized. Since slony replicates transactions and not the on-disk data structures, the new database will be pretty compact. Since Slony allows you to set it up and synchronize it without interrupting normal operation, you can do all this without downtime (although I highly recommend that you do it during a slow period). The final switchover should take less than a minute. Since Slony replicates at the table level, you can decide to make changes as part of the process and possibly not replicate the large objects at all. Slony can be a bit intimidating to set up if you've never used it before, but it's a very poweful tool. I've used it successfully for many years to do a lot of things such as this. -- Bill Moran
On Tue, Feb 3, 2015 at 2:29 PM, Bill Moran <wmoran@potentialtech.com> wrote: > On Tue, 3 Feb 2015 14:17:03 -0500 > Adam Hooper <adam@adamhooper.com> wrote: > > My recommendation here would be to use Slony to replicate the data to a > new server, then switch to the new server once the data has synchornized. Looks exciting. But then I notice: "Slony-I does not automatically replicate changes to large objects (BLOBS)." [1] Does that still apply? It's doable for us to VACUUM FULL and add a notice to our website saying, "you can't upload files for the next two hours." Maybe that's a better idea? Enjoy life, adam [1] http://slony.info/documentation/2.2/limitations.html -- Adam Hooper +1-613-986-3339 http://adamhooper.com
On Tue, 3 Feb 2015 14:48:17 -0500 Adam Hooper <adam@adamhooper.com> wrote: > On Tue, Feb 3, 2015 at 2:29 PM, Bill Moran <wmoran@potentialtech.com> wrote: > > On Tue, 3 Feb 2015 14:17:03 -0500 > > Adam Hooper <adam@adamhooper.com> wrote: > > > > My recommendation here would be to use Slony to replicate the data to a > > new server, then switch to the new server once the data has synchornized. > > Looks exciting. But then I notice: "Slony-I does not automatically > replicate changes to large objects (BLOBS)." [1] > > Does that still apply? I'm not 100% sure, so if you decide to go this route, you'll need to verify what I'm about to say: I'm pretty sure the key word there is "automatically" ... meaning that in order to replicate changes to large objects, you have to replicate the pg_largeobject table, which means futzing about with PG's internals (since pg_largeobject is essentially a system table) and therefore not done automatically. That being said, I believe you can replicate large objects by explicitly telling Slony to replicate pg_largeobject. > It's doable for us to VACUUM FULL and add a notice to our website > saying, "you can't upload files for the next two hours." Maybe that's > a better idea? It's really going to depend on what options you have available. Keep in mind that users won't be able to read large objects either, so you'll need to disable whatever features of the site view the files as well. Whether that's easier or harder depends on how much work it would be to disable those features of the site. -- Bill Moran
On Tue, Feb 3, 2015 at 3:12 PM, Bill Moran <wmoran@potentialtech.com> wrote: > On Tue, 3 Feb 2015 14:48:17 -0500 > Adam Hooper <adam@adamhooper.com> wrote: > >> It's doable for us to VACUUM FULL and add a notice to our website >> saying, "you can't upload files for the next two hours." Maybe that's >> a better idea? > > It's really going to depend on what options you have available. Keep > in mind that users won't be able to read large objects either, so > you'll need to disable whatever features of the site view the files > as well. Whether that's easier or harder depends on how much work it > would be to disable those features of the site. For the record, this is what we went with. To those looking to use large objects (who, um, are already searching for how to VACUUM FULL pg_largeobject), my advice: if we were to do it all over again, we'd have used a separate database per "bucket". For instance, imagine you store uploaded files and processed data in pg_largeobject. Then some day you migrate the processed data elsewhere. If uploaded files went in one database and processed data went into the second, then the uploaded-files database's pg_largeobject table would remain slim, and you could simply DROP DATABASE on the other after all clients stopped using it. There wouldn't be any downtime. My take-away, though, is to avoid the pg_largeobject table whenever possible. You can move BYTEA data with zero downtime using pg_repack, but the same can't be said for large objects. Enjoy life, Adam -- Adam Hooper +1-613-986-3339 http://adamhooper.com