Thread: Catalog bloat (again)

Catalog bloat (again)

From

Ivan Voras

Date:

27 January 2016, 22:55:21

Hi,

I've done my Googling, and it looks like this is a fairly common problem. In my case, there's a collection of hundreds of databases (10 GB+) with apps which are pretty much designed (a long time ago) with heavy use of temp tables - so a non-trivial system.

The databases are vacuumed (not-full) daily, from cron (autovacuum was turned off some time ago for performance reasons), and still their size increases unexpectedly. By using some of the queries floating around on the wiki and stackoverflow[*], I've discovered that the bloat is not, as was assumed, in the user tables, but in the system tables, mostly in pg_attributes and pg_class.

This is becoming a serious problem, as I've seen instances of these tables grow to 6 GB+ (on a 15 GB total database), while still effectively containing on the order of 10.000 records or so. This is quite abnormal.

For blocking reasons, we'd like to avoid vacuum fulls on these tables (as it seems like touching them will lock up everything else).

So, question #1: WTF? How could this happen, on a regularly vacuumed system? Shouldn't the space be reused, at least after a VACUUM? The issue here is not the absolute existence of the bloat space, it's that it's constantly growing for system tables.

Question #2: What can be done about it?

This is PostgreSQL 9.3, migrating soon to 9.4.

[*] https://wiki.postgresql.org/wiki/Show_database_bloat , http://stackoverflow.com/questions/13931989/postgresql-automating-vacuum-full-for-bloated-tables

Re: Catalog bloat (again)

From

Andrew Sullivan

Date:

27 January 2016, 23:02:08

On Wed, Jan 27, 2016 at 11:54:37PM +0100, Ivan Voras wrote:
> So, question #1: WTF? How could this happen, on a regularly vacuumed
> system? Shouldn't the space be reused, at least after a VACUUM? The issue
> here is not the absolute existence of the bloat space, it's that it's
> constantly growing for *system* tables.

Some system tables (particularly pg_attribute) are heavily changed by
a lot of temp table use.  You need to amp up the vacuum frequency on
them, and have a lot of workers, or you don't get to them until it's
too late.

> Question #2: What can be done about it?

You may end up taking an outage in effect, because you need to compact
them at least once.  If you can flip to a replica, that is the easiest
way to fix it.

A

--
Andrew Sullivan
ajs@crankycanuck.ca

Re: Catalog bloat (again)

From

John R Pierce

Date:

27 January 2016, 23:06:42

On 1/27/2016 2:54 PM, Ivan Voras wrote:
> For blocking reasons, we'd like to avoid vacuum fulls on these tables
> (as it seems like touching them will lock up everything else).

vacuum full isn't nearly as nasty in 9.x than it was in much older
releases, so a lot of the caveats no longer apply.

with 10000 actual records, I'd expect a vacuum full of those system
catalogs to go in a matter of seconds, and afaik the only thing that
would be locked would be metadata changes (eg, creating or alter tables
etc).

disabling autovacuum is a very bad idea for exactly these reasons. if
you have specific large tables that autovacuum is causing performances
problems on, tune the autovacuum settings on those specific tables.

--
john r pierce, recycling bits in santa cruz

Re: Catalog bloat (again)

From

Bill Moran

Date:

27 January 2016, 23:13:49

On Wed, 27 Jan 2016 23:54:37 +0100
Ivan Voras <ivoras@gmail.com> wrote:
>
> I've done my Googling, and it looks like this is a fairly common problem.
> In my case, there's a collection of hundreds of databases (10 GB+) with
> apps which are pretty much designed (a long time ago) with heavy use of
> temp tables - so a non-trivial system.
>
> The databases are vacuumed (not-full) daily, from cron

Vacuuming once a day is seldom often enough, except on very quiet
databases.

> (autovacuum was
> turned off some time ago for performance reasons), and still their size
> increases unexpectedly. By using some of the queries floating around on the
> wiki and stackoverflow[*], I've discovered that the bloat is not, as was
> assumed, in the user tables, but in the system tables, mostly in
> pg_attributes and pg_class.

The size increase isn't really unexpected. If you're only vacuuming once
per day, it's very easy for activity to cause active tables to bloat quite
a bit.

> This is becoming a serious problem, as I've seen instances of these tables
> grow to 6 GB+ (on a 15 GB total database), while still effectively
> containing on the order of 10.000 records or so. This is quite abnormal.
>
> For blocking reasons, we'd like to avoid vacuum fulls on these tables (as
> it seems like touching them will lock up everything else).

It will. But to get them back down to a reasonable size, you're going to
have to do a VACUUM FULL at least _once_. If you retune things correctly,
you shouldn't need any more FULLs after that 1 time.

> So, question #1: WTF? How could this happen, on a regularly vacuumed
> system? Shouldn't the space be reused, at least after a VACUUM? The issue
> here is not the absolute existence of the bloat space, it's that it's
> constantly growing for *system* tables.

With a lot of activity, once a day probably isn't regular enough.

> Question #2: What can be done about it?

I highly recommend turning autovacuum back on, then tweaking the autovacuum
parameters to prevent any preformance issues.

However, if you're dead set against autovacuum, find out (using the queries
that are available all over the internet) which tables are bloating the
worst, and schedule additional vacuums via cron that vacuum _only_ the
problem tables. How often is something that will require some guesswork
and/or experimenting, but I would recommend at least once per hour. Since
you're only vacuuming selected tables, the performance impact should be
minimal.

You'll have to do a VACUUM FULL on the bloated tables _once_ to get the size
back down, but if you pick a good schedule or use autovacuum with appropriate
settings, they shouldn't need a VACUUM FULL again after that.

--
Bill Moran

Re: Catalog bloat (again)

From

Ivan Voras

Date:

27 January 2016, 23:38:38

On 28 January 2016 at 00:13, Bill Moran <wmoran@potentialtech.com> wrote:

On Wed, 27 Jan 2016 23:54:37 +0100
Ivan Voras <ivoras@gmail.com> wrote:

> So, question #1: WTF? How could this happen, on a regularly vacuumed
> system? Shouldn't the space be reused, at least after a VACUUM? The issue
> here is not the absolute existence of the bloat space, it's that it's
> constantly growing for *system* tables.

With a lot of activity, once a day probably isn't regular enough.

I sort of see what you are saying. I'm curious, though, what goes wrong with the following list of expectations:

Day-to-day load is approximately the same
So, at the end of the first day there will be some amount of bloat
Vacuum will mark that space re-usable
Within the next day, this space will actually be re-used
... so the bloat won't grow.

Basically, I'm wondering why is it growing after vacuums, not why it exists in the first place?

Re: Catalog bloat (again)

From

Jerry Sievers

Date:

28 January 2016, 02:05:19

Ivan Voras <ivoras@gmail.com> writes:

> On 28 January 2016 at 00:13, Bill Moran <wmoran@potentialtech.com> wrote:
>
>     On Wed, 27 Jan 2016 23:54:37 +0100
>     Ivan Voras <ivoras@gmail.com> wrote:
>
>     > So, question #1: WTF? How could this happen, on a regularly vacuumed
>     > system? Shouldn't the space be reused, at least after a VACUUM? The issue
>     > here is not the absolute existence of the bloat space, it's that it's
>     > constantly growing for *system* tables.
>
>     With a lot of activity, once a day probably isn't regular enough.
>
> I sort of see what you are saying. I'm curious, though, what goes wrong with the following list of expectations:
>
>  1. Day-to-day load is approximately the same
>  2. So, at the end of the first day there will be some amount of bloat
>  3. Vacuum will mark that space re-usable
>  4. Within the next day, this space will actually be re-used
>  5. ... so the bloat won't grow.
>
> Basically, I'm wondering why is it growing after vacuums, not why it exists in the first place?

Probably just a classic case of long-open transactions.

And/or vacuum running as an unprivileged user and thus can't vacuum
catalogs... perhaps  with a naive batch job launcher that sends stderr
to /dev/null.

>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800

Re: Catalog bloat (again)

From

"Joshua D. Drake"

Date:

28 January 2016, 03:11:23

On 01/27/2016 03:37 PM, Ivan Voras wrote:
>
>
> On 28 January 2016 at 00:13, Bill Moran <wmoran@potentialtech.com
> <mailto:wmoran@potentialtech.com>> wrote:
>
>     On Wed, 27 Jan 2016 23:54:37 +0100
>     Ivan Voras <ivoras@gmail.com <mailto:ivoras@gmail.com>> wrote:
>
>     > So, question #1: WTF? How could this happen, on a regularly vacuumed
>     > system? Shouldn't the space be reused, at least after a VACUUM? The issue
>     > here is not the absolute existence of the bloat space, it's that it's
>      > constantly growing for *system* tables.
>
>     With a lot of activity, once a day probably isn't regular enough.
>
>
> I sort of see what you are saying. I'm curious, though, what goes wrong
> with the following list of expectations:
>
>  1. Day-to-day load is approximately the same
>  2. So, at the end of the first day there will be some amount of bloat
>  3. Vacuum will mark that space re-usable
>  4. Within the next day, this space will actually be re-used
>  5. ... so the bloat won't grow.
>
> Basically, I'm wondering why is it growing after vacuums, not why it
> exists in the first place?

If something is causing the autovacuum to be aborted you can have this
problem.

JD


--
Command Prompt, Inc.                  http://the.postgres.company/
                      +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.

Re: Catalog bloat (again)

From

"Joshua D. Drake"

Date:

28 January 2016, 03:11:26

On 01/27/2016 03:37 PM, Ivan Voras wrote:
>
>
> On 28 January 2016 at 00:13, Bill Moran <wmoran@potentialtech.com
> <mailto:wmoran@potentialtech.com>> wrote:
>
>     On Wed, 27 Jan 2016 23:54:37 +0100
>     Ivan Voras <ivoras@gmail.com <mailto:ivoras@gmail.com>> wrote:
>
>     > So, question #1: WTF? How could this happen, on a regularly vacuumed
>     > system? Shouldn't the space be reused, at least after a VACUUM? The issue
>     > here is not the absolute existence of the bloat space, it's that it's
>      > constantly growing for *system* tables.
>
>     With a lot of activity, once a day probably isn't regular enough.
>
>
> I sort of see what you are saying. I'm curious, though, what goes wrong
> with the following list of expectations:
>
>  1. Day-to-day load is approximately the same
>  2. So, at the end of the first day there will be some amount of bloat
>  3. Vacuum will mark that space re-usable
>  4. Within the next day, this space will actually be re-used
>  5. ... so the bloat won't grow.
>
> Basically, I'm wondering why is it growing after vacuums, not why it
> exists in the first place?

If something is causing the autovacuum to be aborted you can have this
problem.

JD


--
Command Prompt, Inc.                  http://the.postgres.company/
                      +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.

Re: Catalog bloat (again)

From

Bill Moran

Date:

28 January 2016, 03:58:37

On Thu, 28 Jan 2016 00:37:54 +0100
Ivan Voras <ivoras@gmail.com> wrote:

> On 28 January 2016 at 00:13, Bill Moran <wmoran@potentialtech.com> wrote:
>
> > On Wed, 27 Jan 2016 23:54:37 +0100
> > Ivan Voras <ivoras@gmail.com> wrote:
> >
> > > So, question #1: WTF? How could this happen, on a regularly vacuumed
> > > system? Shouldn't the space be reused, at least after a VACUUM? The issue
> > > here is not the absolute existence of the bloat space, it's that it's
> > > constantly growing for *system* tables.
> >
> > With a lot of activity, once a day probably isn't regular enough.
> >
> >
> I sort of see what you are saying. I'm curious, though, what goes wrong
> with the following list of expectations:
>
>    1. Day-to-day load is approximately the same
>    2. So, at the end of the first day there will be some amount of bloat
>    3. Vacuum will mark that space re-usable
>    4. Within the next day, this space will actually be re-used
>    5. ... so the bloat won't grow.
>
> Basically, I'm wondering why is it growing after vacuums, not why it exists
> in the first place?

To add to what others have said: are you 100% sure that vacuum is
completing successfully each time it runs? I.e. does your cron job
trap and report failures of vacuum to complete? If it fails occasionally
for whatever reason, it's liable to bloat a lot over 48 hours (i.e.
assuming it succeeds the next time).

Additionally, there's the problem with active transactions causing it to
not clean up quite everything.

Not to belabour the point, but these hiccups are best handled by enabling
autovacuum and allowing it to monitor tables and take care of them for you.
I'm curious of claims of autovacuum causing performance issues, as I've
never seen it have much impact. Generally, if you can't run autovacuum
due to performance issues, your hardware is undersized for your workload
and anything else you do is just going to have problems in a different way.

--
Bill Moran

Re: Catalog bloat (again)

From

Scott Mead

Date:

28 January 2016, 04:58:19


--
Scott Mead via mobile
IPhone : +1-607-765-1395
Skype  : scottm.openscg
Gtalk    : scottm@openscg.com

> On Jan 27, 2016, at 22:11, Joshua D. Drake <jd@commandprompt.com> wrote:
>
>> On 01/27/2016 03:37 PM, Ivan Voras wrote:
>>
>>
>> On 28 January 2016 at 00:13, Bill Moran <wmoran@potentialtech.com
>> <mailto:wmoran@potentialtech.com>> wrote:
>>
>>    On Wed, 27 Jan 2016 23:54:37 +0100
>>    Ivan Voras <ivoras@gmail.com <mailto:ivoras@gmail.com>> wrote:
>>
>>    > So, question #1: WTF? How could this happen, on a regularly vacuumed
>>    > system? Shouldn't the space be reused, at least after a VACUUM? The issue
>>    > here is not the absolute existence of the bloat space, it's that it's
>>     > constantly growing for *system* tables.
>>
>>    With a lot of activity, once a day probably isn't regular enough.
>>
>>
>> I sort of see what you are saying. I'm curious, though, what goes wrong
>> with the following list of expectations:
>>
>> 1. Day-to-day load is approximately the same
>> 2. So, at the end of the first day there will be some amount of bloat
>> 3. Vacuum will mark that space re-usable
>> 4. Within the next day, this space will actually be re-used
>> 5. ... so the bloat won't grow.
>>
>> Basically, I'm wondering why is it growing after vacuums, not why it
>> exists in the first place?
>
> If something is causing the autovacuum to be aborted you can have this problem.
It long-running transactions / idle in transaction / prepared xacts

  Have you considered slowing down on temp tables?  Typically, when bleeding, it's good to find the wound and stitch it
upinstead of just getting more towels.... 


>
> JD
>
>
> --
> Command Prompt, Inc.                  http://the.postgres.company/
>                     +1-503-667-4564
> PostgreSQL Centered full stack support, consulting and development.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Catalog bloat (again)

From

Ivan Voras

Date:

10 February 2016, 21:44:53

As a follow-up, here's a portion of the nightly vacuum's logs, just want to confirm if my conclusions are right:

INFO: vacuuming "pg_catalog.pg_attribute"

INFO: scanned index "pg_attribute_relid_attnam_index" to remove 3014172 row versions

DETAIL: CPU 0.20s/1.08u sec elapsed 3.72 sec.

INFO: scanned index "pg_attribute_relid_attnum_index" to remove 3014172 row versions

DETAIL: CPU 0.14s/0.89u sec elapsed 1.70 sec.

INFO: "pg_attribute": removed 3014172 row versions in 52768 pages

DETAIL: CPU 0.31s/0.30u sec elapsed 1.15 sec.

INFO: index "pg_attribute_relid_attnam_index" now contains 19578 row versions in 45817 pages

DETAIL: 3013689 index row versions were removed.

45668 index pages have been deleted, 34116 are currently reusable.

CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO: index "pg_attribute_relid_attnum_index" now contains 19578 row versions in 32554 pages

DETAIL: 3010630 index row versions were removed.

32462 index pages have been deleted, 24239 are currently reusable.

CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO: "pg_attribute": found 2278389 removable, 17319 nonremovable row versions in 52856 out of 57409 pages

DETAIL: 298 dead row versions cannot be removed yet.

There were 641330 unused item pointers.

0 pages are entirely empty.

CPU 1.44s/2.88u sec elapsed 10.55 sec.

INFO: "pg_attribute": stopping truncate due to conflicting lock request

INFO: analyzing "pg_catalog.pg_attribute"

INFO: "pg_attribute": scanned 30000 of 57409 pages, containing 10301 live rows and 0 dead rows; 10301 rows in sample, 24472 estimated total rows

So,

About 3 million rows churned in the table and its two indexes (i.e. dead rows which vacuum found and marked) - that's per day since this is a nightly operation.
After the vacuum, the indexes are left with 19578 rows in (for the first one) in 45817 pages. That's a lot of empty pages, which should be reused the next day, together with free space in partially filled tables, right?
Since pages are 8 KiB, 46,000 pages is about 360 MiB of space - that's how large just one of the pg_attribute table's indexes is after the vacuum. Altogether, it's more than 1 GB.
The "stopping truncate..." message is not really significant, it would have shortened the data files from the end if there are empty pages at the end, which in this case there isn't a significant number of. The truncation can probably never happen on system tables like these since they are always used...?

The real question is: why is the total size (i.e. the number of pages) growing at an alarming rate? On one of the db's, we're seeing almost doubling in size each week. Is the internal fragmentation of the data files so significant?

Ok, a couple more questions:

How come "0 pages are entirely empty" if there are 17319 rows spread around in 52856 pages?
What are "unused item pointers"?

(I agree with your previous suggestions, will see if they can be implemented).

On 28 January 2016 at 00:13, Bill Moran <wmoran@potentialtech.com> wrote:

On Wed, 27 Jan 2016 23:54:37 +0100
Ivan Voras <ivoras@gmail.com> wrote:
>
> I've done my Googling, and it looks like this is a fairly common problem.
> In my case, there's a collection of hundreds of databases (10 GB+) with
> apps which are pretty much designed (a long time ago) with heavy use of
> temp tables - so a non-trivial system.
>
> The databases are vacuumed (not-full) daily, from cron

Vacuuming once a day is seldom often enough, except on very quiet
databases.

> (autovacuum was
> turned off some time ago for performance reasons), and still their size
> increases unexpectedly. By using some of the queries floating around on the
> wiki and stackoverflow[*], I've discovered that the bloat is not, as was
> assumed, in the user tables, but in the system tables, mostly in
> pg_attributes and pg_class.

The size increase isn't really unexpected. If you're only vacuuming once
per day, it's very easy for activity to cause active tables to bloat quite
a bit.

> This is becoming a serious problem, as I've seen instances of these tables
> grow to 6 GB+ (on a 15 GB total database), while still effectively
> containing on the order of 10.000 records or so. This is quite abnormal.
>
> For blocking reasons, we'd like to avoid vacuum fulls on these tables (as
> it seems like touching them will lock up everything else).

It will. But to get them back down to a reasonable size, you're going to
have to do a VACUUM FULL at least _once_. If you retune things correctly,
you shouldn't need any more FULLs after that 1 time.

> So, question #1: WTF? How could this happen, on a regularly vacuumed
> system? Shouldn't the space be reused, at least after a VACUUM? The issue
> here is not the absolute existence of the bloat space, it's that it's
> constantly growing for *system* tables.

With a lot of activity, once a day probably isn't regular enough.

> Question #2: What can be done about it?

I highly recommend turning autovacuum back on, then tweaking the autovacuum
parameters to prevent any preformance issues.

However, if you're dead set against autovacuum, find out (using the queries
that are available all over the internet) which tables are bloating the
worst, and schedule additional vacuums via cron that vacuum _only_ the
problem tables. How often is something that will require some guesswork
and/or experimenting, but I would recommend at least once per hour. Since
you're only vacuuming selected tables, the performance impact should be
minimal.

You'll have to do a VACUUM FULL on the bloated tables _once_ to get the size
back down, but if you pick a good schedule or use autovacuum with appropriate
settings, they shouldn't need a VACUUM FULL again after that.

--
Bill Moran