Thread: BUG #12772: Unexpected autovacuum behavior
The following bug has been logged on the website: Bug reference: 12772 Logged by: Jason DiCioccio Email address: jd@ods.org PostgreSQL version: 9.3.6 Operating system: Ubuntu 14.04 (Linux 3.13.0-40-generic) Description: Here's the situation I ran into: autovacuum was vacuuming a large table in database 'db1' (to prevent txid wraparound). It turns out that one of the indexes of this table was corrupted from the 9.3.4 WAL issue. So VACUUM failed repeatedly. Nonetheless, autovacuum kept trying. Meanwhile in database 'db2', there were a number of tables in serious need of vacuuming. However, autovacuum would never touch these, even though there were far less than autovacuum_max_workers running. Other tables in database 'db1' WERE being vacuumed, however. So it appears that the logic is that autovacuum operates solely on one database at a time. Even if there is only one table that needs vacuuming in that first database, it will not spawn any workers to vacuum tables that do need vacuuming in a second database until it has completed vacuuming the necessary tables from the first database. This, to me, is unexpected behavior. I'd expect autovacuum to not act as if there were a large barrier between databases and to vacuum any table that need it and that the configuration permits.
On Sat, Feb 14, 2015 at 10:14 AM, <jd@ods.org> wrote: > The following bug has been logged on the website: > > Bug reference: 12772 > Logged by: Jason DiCioccio > Email address: jd@ods.org > PostgreSQL version: 9.3.6 > Operating system: Ubuntu 14.04 (Linux 3.13.0-40-generic) > Description: > > Here's the situation I ran into: > > autovacuum was vacuuming a large table in database 'db1' (to prevent txid > wraparound). It turns out that one of the indexes of this table was > corrupted from the 9.3.4 WAL issue. So VACUUM failed repeatedly. > Nonetheless, autovacuum kept trying. > > Meanwhile in database 'db2', there were a number of tables in serious need > of vacuuming. However, autovacuum would never touch these, even though > there were far less than autovacuum_max_workers running. Other tables in > database 'db1' WERE being vacuumed, however. > > So it appears that the logic is that autovacuum operates solely on one > database at a time. Even if there is only one table that needs vacuuming > in > that first database, it will not spawn any workers to vacuum tables that do > need vacuuming in a second database until it has completed vacuuming the > necessary tables from the first database. > > This, to me, is unexpected behavior. I'd expect autovacuum to not act as > if > there were a large barrier between databases and to vacuum any table that > need it and that the configuration permits. > (Sorry, accidentally did not include the list on my original reply, here is a copy to the list) This is a known issue, but so far there hasn't be a lot of urgency to fix it, probably because of a lack of complaints from the field until now, and because there are a few ways to address it and no one has forcefully advocated for one particular solution. Basically it considers the database being in need of wrap around vacuum as an emergency (even thought it probably isn't) and funnels all future workers to it. But when only one table is in need of that wrap around vacuum, all the other workers can't accomplish anything in that database, and other databases get starved. http://www.postgresql.org/message-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=YPyFPQ@mail.gmail.com Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Sat, Feb 14, 2015 at 10:14 AM, <jd@ods.org> wrote: >> So it appears that the logic is that autovacuum operates solely on one >> database at a time. Even if there is only one table that needs vacuuming in >> that first database, it will not spawn any workers to vacuum tables that do >> need vacuuming in a second database until it has completed vacuuming the >> necessary tables from the first database. > This is a known issue, but so far there hasn't be a lot of urgency to fix > it, probably because of a lack of complaints from the field until now, and > because there are a few ways to address it and no one has forcefully > advocated for one particular solution. > Basically it considers the database being in need of wrap around vacuum as > an emergency (even thought it probably isn't) and funnels all future > workers to it. But when only one table is in need of that wrap around > vacuum, all the other workers can't accomplish anything in that database, > and other databases get starved. Yeah. The thing that makes it difficult to behave in an ideal way is that the autovacuum launcher has only database-level granularity of knowledge. It knows that there's at least one table in that database that is at risk of wraparound, but it doesn't know if there's more than one (and, btw, it usually is a reasonable bet that there *is* more than one, because probably most of the database last got an anti-wraparound vacuum at about the same time). So it's hard to tell whether "send all the troops that way" is good strategy or just a waste of resources. The previous thread you mentioned failed to come up with any ideas that materially improved things, and I suspect that it's not possible to do much without giving the launcher more information than it gets now. One simple mechanism that perhaps could do that is having autovacuum workers report back on whether they found anything useful to do or not. If we know we have worker(s) active in a given database, but another worker exits reporting there was nothing for it to do, we could conclude that that database is saturated for the moment, and stop dispatching new workers to it till the situation changes. I'm not entirely sure about details though; in particular you'd have to be very sure that you didn't miss any "situation changing" conditions or a database could become permanently ignored. There might be other feedback mechanisms that would work better, but I think we need to consider adding something. regards, tom lane