Re: [HACKERS] GUC for cleanup indexes threshold. - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: [HACKERS] GUC for cleanup indexes threshold. |
Date | |
Msg-id | CAH2-Wz=XYDXDzyFfH1aBOGBSD7k6T-AyGaFRFqc027df3dQbXQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] GUC for cleanup indexes threshold. (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: [HACKERS] GUC for cleanup indexes threshold.
Re: [HACKERS] GUC for cleanup indexes threshold. |
List | pgsql-hackers |
On Fri, Mar 3, 2017 at 8:13 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Thank you for clarification. Let me check my understanding. IIUC, > skipping second index vacuum path (lazy_cleanup_index) can not be > cause of leaving page as half-dead state but could leave recyclable > pages that are not marked as a recyclable. But second one, it can be > reclaimed by next index vacuum because btvacuumpage calls > RecordFreeIndexPage for recyclable page. Am I missing something? I think that this is a good summary. You have not missed anything. > My first motivation of this patch is to skip the second index vacuum > patch when vacuum skipped whole table by visibility map. Makes sense. > But as Robert > suggested on another thread, I changed it to have a threshold. If my > understanding is correct, we can have a threshold that specifies the > fraction of the scanned pages by vacuum. If it's set 0.1, > lazy_scan_heap can do the second vacuum index only when 10% of table > is scanned. IOW, if 90% of table pages is skipped, which means almost > of table has not changed since previous vacuum, we can skip the second > index vacuum. It sounds like a setting of 0.1 would leave us in a position where it's very unlikely that a VACUUM of indexes could fail to occur when autovacuum has been triggered in the common way, by the "vacuum threshold" having been exceeded. Does this feeling that I have about it seem accurate to you? Is that actually your intent? It's hard to really be sure, because there are so many confounding factors (e.g. HOT), but if that was 100% true, then I suppose there would theoretically be zero new risk (except, perhaps, of the "other type of bloat" that I described, which I am not very worried about). Please verify my understanding of your thought process: We don't have to freeze indexes at all, ever, so if we see index bloat as a separate problem, we also see that there is no need to *link* index needs to the need for freezing. XID burn rate is a very bad proxy for how bloated an index may be. Besides, we already have a separate trigger for the thing that *actually* matters to indexes (the vacuum threshold stuff). Maybe 0.0 is too low as a default for vacuum_cleanup_index_scale_factor, even though initially it seemed attractive to me for theoretical reasons. Something like 0.01 is still "practically zero", but removes the extreme sensitivity that would have with 0.0. So, 0.01 might make sense as a default for roughly the same reason that autovacuum_vacuum_threshold exists. (Maybe it should be more like autovacuum_vacuum_threshold, in that it's an absolute minimum number of heap blocks to trigger index clean-up.) At some point in the future, it may be possible to actually go ahead with index vacumming, but do only a small amount of B-Tree vacuuming by a process that is similar to a conventional index scan: A process that collects index values from those rows found in the heap, and performs subsequent look-ups to kill tuples in the index. I imagine that in cases where the freeze map stuff really helps, the only index is often on a bigserial column or similar, which naturally has good locality. When you VACUUM heap pages, you attempt to build a range of values for each index, a little like a BRIN index build. This range is what you go on to use to do a cheap index-scan-based B-Tree VACUUM. This could have far far less I/O, though has obvious risks that we need to worry about. That's work for another release, of course. -- Peter Geoghegan
pgsql-hackers by date: