Re: VACUUM's ancillary tasks - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: VACUUM's ancillary tasks |
Date | |
Msg-id | CAEepm=2rWqt1mhT9XwEPUdvcZN_f8qb7+CaejU2D34R-VG8b5w@mail.gmail.com Whole thread Raw |
In response to | VACUUM's ancillary tasks (Vik Fearing <vik@2ndquadrant.fr>) |
Responses |
Re: VACUUM's ancillary tasks
Re: [HACKERS] VACUUM's ancillary tasks |
List | pgsql-hackers |
On Mon, Aug 29, 2016 at 1:26 PM, Vik Fearing <vik@2ndquadrant.fr> wrote: > The attached two patches scratch two itches I've been having for a > while. I'm attaching them together because the second depends on the first. > > Both deal with the fact that [auto]vacuum has taken on more roles than > its original purpose. > > > Patch One: autovacuum insert-heavy tables > > If you have a table that mostly receives INSERTs, it will never get > vacuumed because there are no (or few) dead rows. I have added an > "inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly > the same way as "changes_since_analyze" does. > > The reason such a table needs to be vacuumed is currently twofold: the > visibility map is not updated, slowing down index-only scans; and BRIN > indexes are not maintained, rendering them basically useless. I'm aware of those two problems, but not very familiar with the details. I don't feel qualified to say whether insert counting is the best approach to the problem at this point. I looked into it a little bit however, and had the following thoughts: About BRIN indexes: I couldn't find an explanation of why BRIN indexes don't automatically create new summary tuples when you insert a new tuple in an unsummarised page range. Is it deferred until VACUUM time in order to untangle some otherwise unresolvable interlocking or crash safety problem, or could that one day be done? Assuming that it must be deferred for some technical reason and there is no way around it, then I wonder if there is a more direct and accurate way to figure out when it's necessary than counting inserts. Counting inserts seems slightly bogus because you can't tell whether those were inserts into an existing summarised block which is self-maintaining or not. At first glance it looks a bit like unsummarised ranges can only appear at the end of the table, is that right? If so, couldn't you detect the number of unsummarised BRIN blocks just by comparing the highest summarised BRIN block and the current heap size? About visibility maps: How crazy would it be to estimate the number of not-all-visible pages instead? It would be less work to count that since it would only increase when the *first* tuple is inserted into a page that is currently all visible (ie when the bit is cleared), not for every tuple inserted into any page like your inserts_since_vacuum counter. Another difference is that inserts_since_vacuum is reset even if vacuum finds that it *can't* set the all-visible bit for a given page yet because of some concurrent transaction. In that case the bit is still not set but autovacuum has no reason to be triggered again. -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: