Thread: BUG #12772: Unexpected autovacuum behavior

BUG #12772: Unexpected autovacuum behavior

From

jd@ods.org

Date:

14 February 2015, 18:15:17

The following bug has been logged on the website:

Bug reference:      12772
Logged by:          Jason DiCioccio
Email address:      jd@ods.org
PostgreSQL version: 9.3.6
Operating system:   Ubuntu 14.04 (Linux 3.13.0-40-generic)
Description:

Here's the situation I ran into:

autovacuum was vacuuming a large table in database 'db1' (to prevent txid
wraparound).  It turns out that one of the indexes of this table was
corrupted from the 9.3.4 WAL issue.  So VACUUM failed repeatedly.
Nonetheless, autovacuum kept trying.

Meanwhile in database 'db2', there were a number of tables in serious need
of vacuuming.  However, autovacuum would never touch these, even though
there were far less than autovacuum_max_workers running.  Other tables in
database 'db1' WERE being vacuumed, however.

So it appears that the logic is that autovacuum operates solely on one
database at a time.  Even if there is only one table that needs vacuuming in
that first database, it will not spawn any workers to vacuum tables that do
need vacuuming in a second database until it has completed vacuuming the
necessary tables from the first database.

This, to me, is unexpected behavior.  I'd expect autovacuum to not act as if
there were a large barrier between databases and to vacuum any table that
need it and that the configuration permits.

Fwd: BUG #12772: Unexpected autovacuum behavior

From

Jeff Janes

Date:

17 February 2015, 17:16:39

On Sat, Feb 14, 2015 at 10:14 AM, <jd@ods.org> wrote:

> The following bug has been logged on the website:
>
> Bug reference:      12772
> Logged by:          Jason DiCioccio
> Email address:      jd@ods.org
> PostgreSQL version: 9.3.6
> Operating system:   Ubuntu 14.04 (Linux 3.13.0-40-generic)
> Description:
>
> Here's the situation I ran into:
>
> autovacuum was vacuuming a large table in database 'db1' (to prevent txid
> wraparound).  It turns out that one of the indexes of this table was
> corrupted from the 9.3.4 WAL issue.  So VACUUM failed repeatedly.
> Nonetheless, autovacuum kept trying.
>
> Meanwhile in database 'db2', there were a number of tables in serious need
> of vacuuming.  However, autovacuum would never touch these, even though
> there were far less than autovacuum_max_workers running.  Other tables in
> database 'db1' WERE being vacuumed, however.
>
> So it appears that the logic is that autovacuum operates solely on one
> database at a time.  Even if there is only one table that needs vacuuming
> in
> that first database, it will not spawn any workers to vacuum tables that do
> need vacuuming in a second database until it has completed vacuuming the
> necessary tables from the first database.
>
> This, to me, is unexpected behavior.  I'd expect autovacuum to not act as
> if
> there were a large barrier between databases and to vacuum any table that
> need it and that the configuration permits.
>

(Sorry, accidentally did not include the list on my original reply, here is
a copy to the list)



This is a known issue, but so far there hasn't be a lot of urgency to fix
it, probably because of a lack of complaints from the field until now, and
because there are a few ways to address it and no one has forcefully
advocated for one particular solution.

Basically it considers the database being in need of wrap around vacuum as
an emergency (even thought it probably isn't) and funnels all future
workers to it.  But when only one table is in need of that wrap around
vacuum, all the other workers can't accomplish anything in that database,
and other databases get starved.

http://www.postgresql.org/message-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=YPyFPQ@mail.gmail.com

Cheers,

Jeff

Re: Fwd: BUG #12772: Unexpected autovacuum behavior

From

Tom Lane

Date:

17 February 2015, 18:28:49

Jeff Janes <jeff.janes@gmail.com> writes:
> On Sat, Feb 14, 2015 at 10:14 AM, <jd@ods.org> wrote:
>> So it appears that the logic is that autovacuum operates solely on one
>> database at a time.  Even if there is only one table that needs vacuuming in
>> that first database, it will not spawn any workers to vacuum tables that do
>> need vacuuming in a second database until it has completed vacuuming the
>> necessary tables from the first database.

> This is a known issue, but so far there hasn't be a lot of urgency to fix
> it, probably because of a lack of complaints from the field until now, and
> because there are a few ways to address it and no one has forcefully
> advocated for one particular solution.

> Basically it considers the database being in need of wrap around vacuum as
> an emergency (even thought it probably isn't) and funnels all future
> workers to it.  But when only one table is in need of that wrap around
> vacuum, all the other workers can't accomplish anything in that database,
> and other databases get starved.

Yeah.  The thing that makes it difficult to behave in an ideal way is that
the autovacuum launcher has only database-level granularity of knowledge.
It knows that there's at least one table in that database that is at risk
of wraparound, but it doesn't know if there's more than one (and, btw,
it usually is a reasonable bet that there *is* more than one, because
probably most of the database last got an anti-wraparound vacuum at about
the same time).  So it's hard to tell whether "send all the troops that
way" is good strategy or just a waste of resources.

The previous thread you mentioned failed to come up with any ideas that
materially improved things, and I suspect that it's not possible to do
much without giving the launcher more information than it gets now.

One simple mechanism that perhaps could do that is having autovacuum
workers report back on whether they found anything useful to do or not.
If we know we have worker(s) active in a given database, but another
worker exits reporting there was nothing for it to do, we could conclude
that that database is saturated for the moment, and stop dispatching new
workers to it till the situation changes.  I'm not entirely sure about
details though; in particular you'd have to be very sure that you didn't
miss any "situation changing" conditions or a database could become
permanently ignored.

There might be other feedback mechanisms that would work better, but
I think we need to consider adding something.

            regards, tom lane