Home > mailing lists

Re: stats collector suddenly causing lots of IO - Mailing list pgsql-performance

From	Scott Carey
Subject	Re: stats collector suddenly causing lots of IO
Date	April 16, 2010 14:22:31
Msg-id	C449E205-8B0B-4024-B647-4CC6F2C46CA4@richrelevance.com Whole thread Raw
In response to	Re: stats collector suddenly causing lots of IO (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: stats collector suddenly causing lots of IO
List	pgsql-performance

Tree view

On Apr 16, 2010, at 9:48 AM, Tom Lane wrote:

> Josh Kupershmidt <schmiddy@gmail.com> writes:
>> On Fri, Apr 16, 2010 at 11:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Wow.  Well, we have a smoking gun here: for some reason, autovacuum
>>> isn't running, or isn't doing its job if it is.  If it's not running
>>> at all, that would explain failure to prune the stats collector's file
>>> too.
>
>> Hrm, well autovacuum is at least trying to do work: it's currently
>> stuck on those bloated pg_catalog tables, of course. Another developer
>> killed an autovacuum of pg_attribute (or maybe it was pg_attrdef)
>> after it had been running for two weeks. See current pg_stat_activity
>> output attached, which shows the three autovacuum workers running plus
>> two manual VACUUM ANALYZEs I started yesterday.
>
> Two weeks?  What have you got the autovacuum cost delays set to?
>
> Once you're up to three AV workers, no new ones can get launched until
> one of those finishes or is killed.  So that would explain failure to
> prune the stats collector's tables (the tabpurge code is only run during
> AV worker launch).  So what we need to figure out is why it's taking so
> obscenely long to vacuum these tables ...
>

On any large system with good I/O I have had to significantly increase the aggressiveness of autovacuum.
Even with the below settings, it doesn't interfere with other activity (~2200iops random, ~900MB/sec sequential capable
I/O).

My relevant autovacuum parameters are (from 'show *'):
 autovacuum                      | on                                    | Starts the autovacuum subprocess.
 autovacuum_analyze_scale_factor | 0.1                                   | Number of tuple inserts, updates or deletes
priorto analyze as a fraction of reltuples. 
 autovacuum_analyze_threshold    | 50                                    | Minimum number of tuple inserts, updates or
deletesprior to analyze. 
 autovacuum_freeze_max_age       | 200000000                             | Age at which to autovacuum a table to
preventtransaction ID wraparound. 
 autovacuum_max_workers          | 3                                     | Sets the maximum number of simultaneously
runningautovacuum worker processes. 
 autovacuum_naptime              | 1min                                  | Time to sleep between autovacuum runs.
 autovacuum_vacuum_cost_delay    | 20ms                                  | Vacuum cost delay in milliseconds, for
autovacuum.
 autovacuum_vacuum_cost_limit    | 2000                                  | Vacuum cost amount available before napping,
forautovacuum. 
 autovacuum_vacuum_scale_factor  | 0.2                                   | Number of tuple updates or deletes prior to
vacuumas a fraction of reltuples. 
 autovacuum_vacuum_threshold     | 50

For what it is worth, I just went onto one of my systems -- one with lots of partition tables and temp table
creation/destruction-- and looked at the system tables in question there. 

Postgres 8.4, using dt+  (trimmed result below to interesting tables)

   Schema   |          Name           | Type  |  Owner   |    Size    | Description
------------+-------------------------+-------+----------+------------+-------------
 pg_catalog | pg_attrdef              | table | postgres | 195 MB     |
 pg_catalog | pg_attribute            | table | postgres | 1447 MB    |
 pg_catalog | pg_class                | table | postgres | 1694 MB    |
 pg_catalog | pg_constraint           | table | postgres | 118 MB     |
 pg_catalog | pg_depend               | table | postgres | 195 MB     |
 pg_catalog | pg_statistic            | table | postgres | 2300 MB    |
 pg_catalog | pg_type                 | table | postgres | 181 MB     |

So, I did a vacuum full; reindex table; analyze;   sequence on each of these.  I wish I could just CLUSTER them but the
aboveworks. 

now the tables are:
   Schema   |          Name           | Type  |  Owner   |    Size    | Description
------------+-------------------------+-------+----------+------------+-------------
 pg_catalog | pg_attrdef              | table | postgres | 44 MB      |
 pg_catalog | pg_attribute            | table | postgres | 364 MB     |
 pg_catalog | pg_class                | table | postgres | 1694 MB    |
 pg_catalog | pg_constraint           | table | postgres | 118 MB     |
 pg_catalog | pg_depend               | table | postgres | 195 MB     |
 pg_catalog | pg_statistic            | table | postgres | 656 MB     |
 pg_catalog | pg_type                 | table | postgres | 45 MB      |

I've learned to accept about 50% bloat (2x the compacted size) in postgres as just the way it usually is on a busy
table,but the 3x and 4x bloat of statistic, attrdef, and attribute have me wondering. 

I have had some 'idle in transaction' connections hanging out from time to time that have caused issues on this machine
thatcould explain the above perma-bloat.  That is one thing that could affect the case reported here as well.  The
worstthing about those, is you can't even force kill those connections from within postgres (pg_cancel_backend doesn't
workon them, and killing them via the OS bounces postgres ...) so you have to hunt down the offending client. 

>             regards, tom lane
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

pgsql-performance by date:

From: Josh Berkus
Date: 16 April 2010, 13:59:31
Subject: Re: Autovaccum with cost_delay does not complete on one solaris 5.10 machine

From: Scott Carey
Date: 16 April 2010, 14:35:47
Subject: Re: stats collector suddenly causing lots of IO

Re: stats collector suddenly causing lots of IO - Mailing list pgsql-performance

Previous

Next