Thread: VACUUM FULL pg_largeobject without (much) downtime?

VACUUM FULL pg_largeobject without (much) downtime?

From

Adam Hooper

Date:

03 February 2015, 15:53:19

Hi list,

We run a website. We once stored all sorts of files in pg_largeobject,
which grew to 266GB. This is on an m1.large on Amazon EC2 on a single,
magnetic, non-provisioned-IO volume. In that context, 266GB is a lot.

We've since moved all but 60GB of that data to S3. We plan to reduce
that to 1GB by deleting old, unused data. Of course, pg_largeobject
will still take up 266GB because autovacuum doesn't reduce disk space.

We want to move our entire database to an SSD volume, with as little
downtime as possible. My tentative plan:

1. Use CREATE TABLESPACE and pg_repack to move user tables to a temporary volume
2. Take down Postgres, copy system-table files to the new volume, and
start up Postgres from the new volume
3. Use pg_repack to move everything to the new volume

This plan won't work: Step 2 will be too slow because pg_largeobject
still takes 266GB. We tested `VACUUM FULL pg_largeobject` on our
staging database: it took two hours, during which pg_largeobject was
locked. When pg_largeobject is locked, lots of our website doesn't
work.

How can we move our database without much downtime? Is there a way to
`VACUUM FULL` pg_largeobject without locking it for very long? Aside
from that problem, is the rest of my upgrade plan sane?

For what it's worth, here's some info from VACUUM VERBOSE:

overview=# VACUUM (VERBOSE, ANALYZE) pg_largeobject;
INFO:  vacuuming "pg_catalog.pg_largeobject"
INFO:  scanned index "pg_largeobject_loid_pn_index" to remove 1112630
row versions
DETAIL:  CPU 3.38s/9.89u sec elapsed 69.02 sec.
INFO:  "pg_largeobject": removed 1112630 row versions in 374889 pages
DETAIL:  CPU 7.48s/2.22u sec elapsed 150.44 sec.
INFO:  index "pg_largeobject_loid_pn_index" now contains 29373858 row
versions in 370470 pages
DETAIL:  1112630 index row versions were removed.
279489 index pages have been deleted, 276070 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.06 sec.
INFO:  "pg_largeobject": found 1112622 removable, 231974 nonremovable
row versions in 3189820 out of 34522175 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 10318306 unused item pointers.
0 pages are entirely empty.
CPU 69.81s/28.83u sec elapsed 1402.53 sec.
INFO:  analyzing "pg_catalog.pg_largeobject"
INFO:  "pg_largeobject": scanned 30000 of 34522175 pages, containing
25085 live rows and 0 dead rows; 25085 rows in sample, 24203398
estimated total rows
VACUUM

Enjoy life,
Adam

--
Adam Hooper
+1-613-986-3339
http://adamhooper.com

Re: VACUUM FULL pg_largeobject without (much) downtime?

From

Bill Moran

Date:

03 February 2015, 18:02:27

On Tue, 3 Feb 2015 10:53:11 -0500
Adam Hooper <adam@adamhooper.com> wrote:

> Hi list,
>
> We run a website. We once stored all sorts of files in pg_largeobject,
> which grew to 266GB. This is on an m1.large on Amazon EC2 on a single,
> magnetic, non-provisioned-IO volume. In that context, 266GB is a lot.
>
> We've since moved all but 60GB of that data to S3. We plan to reduce
> that to 1GB by deleting old, unused data. Of course, pg_largeobject
> will still take up 266GB because autovacuum doesn't reduce disk space.
>
> We want to move our entire database to an SSD volume, with as little
> downtime as possible. My tentative plan:
>
> 1. Use CREATE TABLESPACE and pg_repack to move user tables to a temporary volume
> 2. Take down Postgres, copy system-table files to the new volume, and
> start up Postgres from the new volume
> 3. Use pg_repack to move everything to the new volume
>
> This plan won't work: Step 2 will be too slow because pg_largeobject
> still takes 266GB. We tested `VACUUM FULL pg_largeobject` on our
> staging database: it took two hours, during which pg_largeobject was
> locked. When pg_largeobject is locked, lots of our website doesn't
> work.

Sometimes CLUSTER is faster than VACUUM FULL ... have you tested CLUSTERing
of pg_largeobject on your test system to see if it's fast enough?

How big is the non-lo data?

--
Bill Moran

Re: VACUUM FULL pg_largeobject without (much) downtime?

From

Adam Hooper

Date:

03 February 2015, 19:17:10

On Tue, Feb 3, 2015 at 12:58 PM, Bill Moran <wmoran@potentialtech.com> wrote:
> On Tue, 3 Feb 2015 10:53:11 -0500
> Adam Hooper <adam@adamhooper.com> wrote:
>
>> This plan won't work: Step 2 will be too slow because pg_largeobject
>> still takes 266GB. We tested `VACUUM FULL pg_largeobject` on our
>> staging database: it took two hours, during which pg_largeobject was
>> locked. When pg_largeobject is locked, lots of our website doesn't
>> work.
>
> Sometimes CLUSTER is faster than VACUUM FULL ... have you tested CLUSTERing
> of pg_largeobject on your test system to see if it's fast enough?

On the 30GB that's left on staging, it takes 50min. Unfortunately, our
staging database is now at 30GB because we already completed a VACUUM
FULL on it. It seems difficult to me to revert that operation. But I
need an orders-of-magnitude difference, and this clearly isn't it.

> How big is the non-lo data?

It's 65GB, but I've used pg_repack to move it to a separate tablespace
so it won't affect downtime.

Enjoy life,
Adam

--
Adam Hooper
+1-613-986-3339
http://adamhooper.com

Re: VACUUM FULL pg_largeobject without (much) downtime?

From

Bill Moran

Date:

03 February 2015, 19:33:48

On Tue, 3 Feb 2015 14:17:03 -0500
Adam Hooper <adam@adamhooper.com> wrote:

> On Tue, Feb 3, 2015 at 12:58 PM, Bill Moran <wmoran@potentialtech.com> wrote:
> > On Tue, 3 Feb 2015 10:53:11 -0500
> > Adam Hooper <adam@adamhooper.com> wrote:
> >
> >> This plan won't work: Step 2 will be too slow because pg_largeobject
> >> still takes 266GB. We tested `VACUUM FULL pg_largeobject` on our
> >> staging database: it took two hours, during which pg_largeobject was
> >> locked. When pg_largeobject is locked, lots of our website doesn't
> >> work.
> >
> > Sometimes CLUSTER is faster than VACUUM FULL ... have you tested CLUSTERing
> > of pg_largeobject on your test system to see if it's fast enough?
>
> On the 30GB that's left on staging, it takes 50min. Unfortunately, our
> staging database is now at 30GB because we already completed a VACUUM
> FULL on it. It seems difficult to me to revert that operation. But I
> need an orders-of-magnitude difference, and this clearly isn't it.

It was worth a shot.

> > How big is the non-lo data?
>
> It's 65GB, but I've used pg_repack to move it to a separate tablespace
> so it won't affect downtime.

My recommendation here would be to use Slony to replicate the data to a
new server, then switch to the new server once the data has synchornized.

Since slony replicates transactions and not the on-disk data structures,
the new database will be pretty compact. Since Slony allows you to set it
up and synchronize it without interrupting normal operation, you can
do all this without downtime (although I highly recommend that you do it
during a slow period). The final switchover should take less than a minute.

Since Slony replicates at the table level, you can decide to make changes
as part of the process and possibly not replicate the large objects at all.

Slony can be a bit intimidating to set up if you've never used it before,
but it's a very poweful tool. I've used it successfully for many years to
do a lot of things such as this.

--
Bill Moran

Re: VACUUM FULL pg_largeobject without (much) downtime?

From

Adam Hooper

Date:

03 February 2015, 19:48:25

On Tue, Feb 3, 2015 at 2:29 PM, Bill Moran <wmoran@potentialtech.com> wrote:
> On Tue, 3 Feb 2015 14:17:03 -0500
> Adam Hooper <adam@adamhooper.com> wrote:
>
> My recommendation here would be to use Slony to replicate the data to a
> new server, then switch to the new server once the data has synchornized.

Looks exciting. But then I notice: "Slony-I does not automatically
replicate changes to large objects (BLOBS)." [1]

Does that still apply?

It's doable for us to VACUUM FULL and add a notice to our website
saying, "you can't upload files for the next two hours." Maybe that's
a better idea?

Enjoy life,
adam

[1] http://slony.info/documentation/2.2/limitations.html

--
Adam Hooper
+1-613-986-3339
http://adamhooper.com

Re: VACUUM FULL pg_largeobject without (much) downtime?

From

Bill Moran

Date:

03 February 2015, 20:17:06

On Tue, 3 Feb 2015 14:48:17 -0500
Adam Hooper <adam@adamhooper.com> wrote:

> On Tue, Feb 3, 2015 at 2:29 PM, Bill Moran <wmoran@potentialtech.com> wrote:
> > On Tue, 3 Feb 2015 14:17:03 -0500
> > Adam Hooper <adam@adamhooper.com> wrote:
> >
> > My recommendation here would be to use Slony to replicate the data to a
> > new server, then switch to the new server once the data has synchornized.
>
> Looks exciting. But then I notice: "Slony-I does not automatically
> replicate changes to large objects (BLOBS)." [1]
>
> Does that still apply?

I'm not 100% sure, so if you decide to go this route, you'll need to
verify what I'm about to say: I'm pretty sure the key word there is
"automatically" ... meaning that in order to replicate changes to
large objects, you have to replicate the pg_largeobject table, which
means futzing about with PG's internals (since pg_largeobject is
essentially a system table) and therefore not done automatically.
That being said, I believe you can replicate large objects by
explicitly telling Slony to replicate pg_largeobject.

> It's doable for us to VACUUM FULL and add a notice to our website
> saying, "you can't upload files for the next two hours." Maybe that's
> a better idea?

It's really going to depend on what options you have available. Keep
in mind that users won't be able to read large objects either, so
you'll need to disable whatever features of the site view the files
as well. Whether that's easier or harder depends on how much work it
would be to disable those features of the site.

--
Bill Moran

Re: VACUUM FULL pg_largeobject without (much) downtime?

From

Adam Hooper

Date:

04 February 2015, 21:26:13

On Tue, Feb 3, 2015 at 3:12 PM, Bill Moran <wmoran@potentialtech.com> wrote:
> On Tue, 3 Feb 2015 14:48:17 -0500
> Adam Hooper <adam@adamhooper.com> wrote:
>
>> It's doable for us to VACUUM FULL and add a notice to our website
>> saying, "you can't upload files for the next two hours." Maybe that's
>> a better idea?
>
> It's really going to depend on what options you have available. Keep
> in mind that users won't be able to read large objects either, so
> you'll need to disable whatever features of the site view the files
> as well. Whether that's easier or harder depends on how much work it
> would be to disable those features of the site.

For the record, this is what we went with.

To those looking to use large objects (who, um, are already searching
for how to VACUUM FULL pg_largeobject), my advice: if we were to do it
all over again, we'd have used a separate database per "bucket".

For instance, imagine you store uploaded files and processed data in
pg_largeobject. Then some day you migrate the processed data
elsewhere. If uploaded files went in one database and processed data
went into the second, then the uploaded-files database's
pg_largeobject table would remain slim, and you could simply DROP
DATABASE on the other after all clients stopped using it. There
wouldn't be any downtime.

My take-away, though, is to avoid the pg_largeobject table whenever
possible. You can move BYTEA data with zero downtime using pg_repack,
but the same can't be said for large objects.

Enjoy life,
Adam

--
Adam Hooper
+1-613-986-3339
http://adamhooper.com