Thread: Catalog bloat (again)
Hi,
I've done my Googling, and it looks like this is a fairly common problem. In my case, there's a collection of hundreds of databases (10 GB+) with apps which are pretty much designed (a long time ago) with heavy use of temp tables - so a non-trivial system.
The databases are vacuumed (not-full) daily, from cron (autovacuum was turned off some time ago for performance reasons), and still their size increases unexpectedly. By using some of the queries floating around on the wiki and stackoverflow[*], I've discovered that the bloat is not, as was assumed, in the user tables, but in the system tables, mostly in pg_attributes and pg_class.
This is becoming a serious problem, as I've seen instances of these tables grow to 6 GB+ (on a 15 GB total database), while still effectively containing on the order of 10.000 records or so. This is quite abnormal.
For blocking reasons, we'd like to avoid vacuum fulls on these tables (as it seems like touching them will lock up everything else).
So, question #1: WTF? How could this happen, on a regularly vacuumed system? Shouldn't the space be reused, at least after a VACUUM? The issue here is not the absolute existence of the bloat space, it's that it's constantly growing for system tables.
Question #2: What can be done about it?
This is PostgreSQL 9.3, migrating soon to 9.4.
[*] https://wiki.postgresql.org/wiki/Show_database_bloat , http://stackoverflow.com/questions/13931989/postgresql-automating-vacuum-full-for-bloated-tables
On Wed, Jan 27, 2016 at 11:54:37PM +0100, Ivan Voras wrote: > So, question #1: WTF? How could this happen, on a regularly vacuumed > system? Shouldn't the space be reused, at least after a VACUUM? The issue > here is not the absolute existence of the bloat space, it's that it's > constantly growing for *system* tables. Some system tables (particularly pg_attribute) are heavily changed by a lot of temp table use. You need to amp up the vacuum frequency on them, and have a lot of workers, or you don't get to them until it's too late. > Question #2: What can be done about it? You may end up taking an outage in effect, because you need to compact them at least once. If you can flip to a replica, that is the easiest way to fix it. A -- Andrew Sullivan ajs@crankycanuck.ca
On 1/27/2016 2:54 PM, Ivan Voras wrote: > For blocking reasons, we'd like to avoid vacuum fulls on these tables > (as it seems like touching them will lock up everything else). vacuum full isn't nearly as nasty in 9.x than it was in much older releases, so a lot of the caveats no longer apply. with 10000 actual records, I'd expect a vacuum full of those system catalogs to go in a matter of seconds, and afaik the only thing that would be locked would be metadata changes (eg, creating or alter tables etc). disabling autovacuum is a very bad idea for exactly these reasons. if you have specific large tables that autovacuum is causing performances problems on, tune the autovacuum settings on those specific tables. -- john r pierce, recycling bits in santa cruz
On Wed, 27 Jan 2016 23:54:37 +0100 Ivan Voras <ivoras@gmail.com> wrote: > > I've done my Googling, and it looks like this is a fairly common problem. > In my case, there's a collection of hundreds of databases (10 GB+) with > apps which are pretty much designed (a long time ago) with heavy use of > temp tables - so a non-trivial system. > > The databases are vacuumed (not-full) daily, from cron Vacuuming once a day is seldom often enough, except on very quiet databases. > (autovacuum was > turned off some time ago for performance reasons), and still their size > increases unexpectedly. By using some of the queries floating around on the > wiki and stackoverflow[*], I've discovered that the bloat is not, as was > assumed, in the user tables, but in the system tables, mostly in > pg_attributes and pg_class. The size increase isn't really unexpected. If you're only vacuuming once per day, it's very easy for activity to cause active tables to bloat quite a bit. > This is becoming a serious problem, as I've seen instances of these tables > grow to 6 GB+ (on a 15 GB total database), while still effectively > containing on the order of 10.000 records or so. This is quite abnormal. > > For blocking reasons, we'd like to avoid vacuum fulls on these tables (as > it seems like touching them will lock up everything else). It will. But to get them back down to a reasonable size, you're going to have to do a VACUUM FULL at least _once_. If you retune things correctly, you shouldn't need any more FULLs after that 1 time. > So, question #1: WTF? How could this happen, on a regularly vacuumed > system? Shouldn't the space be reused, at least after a VACUUM? The issue > here is not the absolute existence of the bloat space, it's that it's > constantly growing for *system* tables. With a lot of activity, once a day probably isn't regular enough. > Question #2: What can be done about it? I highly recommend turning autovacuum back on, then tweaking the autovacuum parameters to prevent any preformance issues. However, if you're dead set against autovacuum, find out (using the queries that are available all over the internet) which tables are bloating the worst, and schedule additional vacuums via cron that vacuum _only_ the problem tables. How often is something that will require some guesswork and/or experimenting, but I would recommend at least once per hour. Since you're only vacuuming selected tables, the performance impact should be minimal. You'll have to do a VACUUM FULL on the bloated tables _once_ to get the size back down, but if you pick a good schedule or use autovacuum with appropriate settings, they shouldn't need a VACUUM FULL again after that. -- Bill Moran
On 28 January 2016 at 00:13, Bill Moran <wmoran@potentialtech.com> wrote:
On Wed, 27 Jan 2016 23:54:37 +0100
Ivan Voras <ivoras@gmail.com> wrote:
> So, question #1: WTF? How could this happen, on a regularly vacuumed
> system? Shouldn't the space be reused, at least after a VACUUM? The issue
> here is not the absolute existence of the bloat space, it's that it's
> constantly growing for *system* tables.
With a lot of activity, once a day probably isn't regular enough.
I sort of see what you are saying. I'm curious, though, what goes wrong with the following list of expectations:
- Day-to-day load is approximately the same
- So, at the end of the first day there will be some amount of bloat
- Vacuum will mark that space re-usable
- Within the next day, this space will actually be re-used
- ... so the bloat won't grow.
Basically, I'm wondering why is it growing after vacuums, not why it exists in the first place?
Ivan Voras <ivoras@gmail.com> writes: > On 28 January 2016 at 00:13, Bill Moran <wmoran@potentialtech.com> wrote: > > On Wed, 27 Jan 2016 23:54:37 +0100 > Ivan Voras <ivoras@gmail.com> wrote: > > > So, question #1: WTF? How could this happen, on a regularly vacuumed > > system? Shouldn't the space be reused, at least after a VACUUM? The issue > > here is not the absolute existence of the bloat space, it's that it's > > constantly growing for *system* tables. > > With a lot of activity, once a day probably isn't regular enough. > > I sort of see what you are saying. I'm curious, though, what goes wrong with the following list of expectations: > > 1. Day-to-day load is approximately the same > 2. So, at the end of the first day there will be some amount of bloat > 3. Vacuum will mark that space re-usable > 4. Within the next day, this space will actually be re-used > 5. ... so the bloat won't grow. > > Basically, I'm wondering why is it growing after vacuums, not why it exists in the first place? Probably just a classic case of long-open transactions. And/or vacuum running as an unprivileged user and thus can't vacuum catalogs... perhaps with a naive batch job launcher that sends stderr to /dev/null. > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net p: 312.241.7800
On 01/27/2016 03:37 PM, Ivan Voras wrote: > > > On 28 January 2016 at 00:13, Bill Moran <wmoran@potentialtech.com > <mailto:wmoran@potentialtech.com>> wrote: > > On Wed, 27 Jan 2016 23:54:37 +0100 > Ivan Voras <ivoras@gmail.com <mailto:ivoras@gmail.com>> wrote: > > > So, question #1: WTF? How could this happen, on a regularly vacuumed > > system? Shouldn't the space be reused, at least after a VACUUM? The issue > > here is not the absolute existence of the bloat space, it's that it's > > constantly growing for *system* tables. > > With a lot of activity, once a day probably isn't regular enough. > > > I sort of see what you are saying. I'm curious, though, what goes wrong > with the following list of expectations: > > 1. Day-to-day load is approximately the same > 2. So, at the end of the first day there will be some amount of bloat > 3. Vacuum will mark that space re-usable > 4. Within the next day, this space will actually be re-used > 5. ... so the bloat won't grow. > > Basically, I'm wondering why is it growing after vacuums, not why it > exists in the first place? If something is causing the autovacuum to be aborted you can have this problem. JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development.
On 01/27/2016 03:37 PM, Ivan Voras wrote: > > > On 28 January 2016 at 00:13, Bill Moran <wmoran@potentialtech.com > <mailto:wmoran@potentialtech.com>> wrote: > > On Wed, 27 Jan 2016 23:54:37 +0100 > Ivan Voras <ivoras@gmail.com <mailto:ivoras@gmail.com>> wrote: > > > So, question #1: WTF? How could this happen, on a regularly vacuumed > > system? Shouldn't the space be reused, at least after a VACUUM? The issue > > here is not the absolute existence of the bloat space, it's that it's > > constantly growing for *system* tables. > > With a lot of activity, once a day probably isn't regular enough. > > > I sort of see what you are saying. I'm curious, though, what goes wrong > with the following list of expectations: > > 1. Day-to-day load is approximately the same > 2. So, at the end of the first day there will be some amount of bloat > 3. Vacuum will mark that space re-usable > 4. Within the next day, this space will actually be re-used > 5. ... so the bloat won't grow. > > Basically, I'm wondering why is it growing after vacuums, not why it > exists in the first place? If something is causing the autovacuum to be aborted you can have this problem. JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development.
On Thu, 28 Jan 2016 00:37:54 +0100 Ivan Voras <ivoras@gmail.com> wrote: > On 28 January 2016 at 00:13, Bill Moran <wmoran@potentialtech.com> wrote: > > > On Wed, 27 Jan 2016 23:54:37 +0100 > > Ivan Voras <ivoras@gmail.com> wrote: > > > > > So, question #1: WTF? How could this happen, on a regularly vacuumed > > > system? Shouldn't the space be reused, at least after a VACUUM? The issue > > > here is not the absolute existence of the bloat space, it's that it's > > > constantly growing for *system* tables. > > > > With a lot of activity, once a day probably isn't regular enough. > > > > > I sort of see what you are saying. I'm curious, though, what goes wrong > with the following list of expectations: > > 1. Day-to-day load is approximately the same > 2. So, at the end of the first day there will be some amount of bloat > 3. Vacuum will mark that space re-usable > 4. Within the next day, this space will actually be re-used > 5. ... so the bloat won't grow. > > Basically, I'm wondering why is it growing after vacuums, not why it exists > in the first place? To add to what others have said: are you 100% sure that vacuum is completing successfully each time it runs? I.e. does your cron job trap and report failures of vacuum to complete? If it fails occasionally for whatever reason, it's liable to bloat a lot over 48 hours (i.e. assuming it succeeds the next time). Additionally, there's the problem with active transactions causing it to not clean up quite everything. Not to belabour the point, but these hiccups are best handled by enabling autovacuum and allowing it to monitor tables and take care of them for you. I'm curious of claims of autovacuum causing performance issues, as I've never seen it have much impact. Generally, if you can't run autovacuum due to performance issues, your hardware is undersized for your workload and anything else you do is just going to have problems in a different way. -- Bill Moran
-- Scott Mead via mobile IPhone : +1-607-765-1395 Skype : scottm.openscg Gtalk : scottm@openscg.com > On Jan 27, 2016, at 22:11, Joshua D. Drake <jd@commandprompt.com> wrote: > >> On 01/27/2016 03:37 PM, Ivan Voras wrote: >> >> >> On 28 January 2016 at 00:13, Bill Moran <wmoran@potentialtech.com >> <mailto:wmoran@potentialtech.com>> wrote: >> >> On Wed, 27 Jan 2016 23:54:37 +0100 >> Ivan Voras <ivoras@gmail.com <mailto:ivoras@gmail.com>> wrote: >> >> > So, question #1: WTF? How could this happen, on a regularly vacuumed >> > system? Shouldn't the space be reused, at least after a VACUUM? The issue >> > here is not the absolute existence of the bloat space, it's that it's >> > constantly growing for *system* tables. >> >> With a lot of activity, once a day probably isn't regular enough. >> >> >> I sort of see what you are saying. I'm curious, though, what goes wrong >> with the following list of expectations: >> >> 1. Day-to-day load is approximately the same >> 2. So, at the end of the first day there will be some amount of bloat >> 3. Vacuum will mark that space re-usable >> 4. Within the next day, this space will actually be re-used >> 5. ... so the bloat won't grow. >> >> Basically, I'm wondering why is it growing after vacuums, not why it >> exists in the first place? > > If something is causing the autovacuum to be aborted you can have this problem. It long-running transactions / idle in transaction / prepared xacts Have you considered slowing down on temp tables? Typically, when bleeding, it's good to find the wound and stitch it upinstead of just getting more towels.... > > JD > > > -- > Command Prompt, Inc. http://the.postgres.company/ > +1-503-667-4564 > PostgreSQL Centered full stack support, consulting and development. > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
As a follow-up, here's a portion of the nightly vacuum's logs, just want to confirm if my conclusions are right:
INFO: vacuuming "pg_catalog.pg_attribute"
INFO: vacuuming "pg_catalog.pg_attribute"
INFO: scanned index "pg_attribute_relid_attnam_index" to remove 3014172 row versions
DETAIL: CPU 0.20s/1.08u sec elapsed 3.72 sec.
INFO: scanned index "pg_attribute_relid_attnum_index" to remove 3014172 row versions
DETAIL: CPU 0.14s/0.89u sec elapsed 1.70 sec.
INFO: "pg_attribute": removed 3014172 row versions in 52768 pages
DETAIL: CPU 0.31s/0.30u sec elapsed 1.15 sec.
INFO: index "pg_attribute_relid_attnam_index" now contains 19578 row versions in 45817 pages
DETAIL: 3013689 index row versions were removed.
45668 index pages have been deleted, 34116 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "pg_attribute_relid_attnum_index" now contains 19578 row versions in 32554 pages
DETAIL: 3010630 index row versions were removed.
32462 index pages have been deleted, 24239 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_attribute": found 2278389 removable, 17319 nonremovable row versions in 52856 out of 57409 pages
DETAIL: 298 dead row versions cannot be removed yet.
There were 641330 unused item pointers.
0 pages are entirely empty.
CPU 1.44s/2.88u sec elapsed 10.55 sec.
INFO: "pg_attribute": stopping truncate due to conflicting lock request
INFO: analyzing "pg_catalog.pg_attribute"
INFO: "pg_attribute": scanned 30000 of 57409 pages, containing 10301 live rows and 0 dead rows; 10301 rows in sample, 24472 estimated total rows
So,
- About 3 million rows churned in the table and its two indexes (i.e. dead rows which vacuum found and marked) - that's per day since this is a nightly operation.
- After the vacuum, the indexes are left with 19578 rows in (for the first one) in 45817 pages. That's a lot of empty pages, which should be reused the next day, together with free space in partially filled tables, right?
- Since pages are 8 KiB, 46,000 pages is about 360 MiB of space - that's how large just one of the pg_attribute table's indexes is after the vacuum. Altogether, it's more than 1 GB.
- The "stopping truncate..." message is not really significant, it would have shortened the data files from the end if there are empty pages at the end, which in this case there isn't a significant number of. The truncation can probably never happen on system tables like these since they are always used...?
The real question is: why is the total size (i.e. the number of pages) growing at an alarming rate? On one of the db's, we're seeing almost doubling in size each week. Is the internal fragmentation of the data files so significant?
Ok, a couple more questions:
- How come "0 pages are entirely empty" if there are 17319 rows spread around in 52856 pages?
- What are "unused item pointers"?
(I agree with your previous suggestions, will see if they can be implemented).
On 28 January 2016 at 00:13, Bill Moran <wmoran@potentialtech.com> wrote:
On Wed, 27 Jan 2016 23:54:37 +0100
Ivan Voras <ivoras@gmail.com> wrote:
>
> I've done my Googling, and it looks like this is a fairly common problem.
> In my case, there's a collection of hundreds of databases (10 GB+) with
> apps which are pretty much designed (a long time ago) with heavy use of
> temp tables - so a non-trivial system.
>
> The databases are vacuumed (not-full) daily, from cron
Vacuuming once a day is seldom often enough, except on very quiet
databases.
> (autovacuum was
> turned off some time ago for performance reasons), and still their size
> increases unexpectedly. By using some of the queries floating around on the
> wiki and stackoverflow[*], I've discovered that the bloat is not, as was
> assumed, in the user tables, but in the system tables, mostly in
> pg_attributes and pg_class.
The size increase isn't really unexpected. If you're only vacuuming once
per day, it's very easy for activity to cause active tables to bloat quite
a bit.
> This is becoming a serious problem, as I've seen instances of these tables
> grow to 6 GB+ (on a 15 GB total database), while still effectively
> containing on the order of 10.000 records or so. This is quite abnormal.
>
> For blocking reasons, we'd like to avoid vacuum fulls on these tables (as
> it seems like touching them will lock up everything else).
It will. But to get them back down to a reasonable size, you're going to
have to do a VACUUM FULL at least _once_. If you retune things correctly,
you shouldn't need any more FULLs after that 1 time.
> So, question #1: WTF? How could this happen, on a regularly vacuumed
> system? Shouldn't the space be reused, at least after a VACUUM? The issue
> here is not the absolute existence of the bloat space, it's that it's
> constantly growing for *system* tables.
With a lot of activity, once a day probably isn't regular enough.
> Question #2: What can be done about it?
I highly recommend turning autovacuum back on, then tweaking the autovacuum
parameters to prevent any preformance issues.
However, if you're dead set against autovacuum, find out (using the queries
that are available all over the internet) which tables are bloating the
worst, and schedule additional vacuums via cron that vacuum _only_ the
problem tables. How often is something that will require some guesswork
and/or experimenting, but I would recommend at least once per hour. Since
you're only vacuuming selected tables, the performance impact should be
minimal.
You'll have to do a VACUUM FULL on the bloated tables _once_ to get the size
back down, but if you pick a good schedule or use autovacuum with appropriate
settings, they shouldn't need a VACUUM FULL again after that.
--
Bill Moran