Re: Proposal: Another attempt at vacuum improvements - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Proposal: Another attempt at vacuum improvements |
Date | |
Msg-id | BANLkTi=hX8i3Ma4XpCCrOp-mEsi7zCpfhw@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal: Another attempt at vacuum improvements (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Proposal: Another attempt at vacuum improvements
Re: Proposal: Another attempt at vacuum improvements |
List | pgsql-hackers |
On Wed, May 25, 2011 at 7:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, May 24, 2011 at 7:58 AM, Pavan Deolasee > <pavan.deolasee@gmail.com> wrote: > >> The biggest gripe today is that vacuum needs two heap scans and each scan >> dirties the buffer. > > That's not that clear to me. The debate usually stalls because we > don't have sufficient info from real world analysis of where the time > goes. It probably wouldn't be too hard to write a simple patch to measure time spent during the first heap pass, time spent scanning indexes, and time spent on the second heap pass. But it's not just about where the time goes: as Pavan says, the second heap pass feels like a waste.Actually, the time spent scanning the indexes kindafeels like a waste too, if the number of tuples being removed is small. Which brings me to your second point: >> So the idea is to separate the index vacuum (removing index pointers to dead >> tuples) from the heap vacuum. When we do heap vacuum (either by HOT-pruning >> or using regular vacuum), we can spool the dead line pointers somewhere. > > ISTM it will be complex to attempt to store the exact list of TIDs > between VACUUMs. > > At the moment we scan indexes if we have > 0 rows to remove, which is > probably wasteful. Perhaps it would be better to keep a running total > of rows to remove, by updating pg_stats, then when we hit a certain > threshold in total we can do the index scan. So we don't need to > remember the TIDs, just remember how many there were and use that to > avoid cleaning too vigorously. That occurred to me, too. If we're being launched by autovacuum then we know that a number of updates and deletes equal ~20% (or whatever autovacuum_vacuum_scale_factor is set to) of the table size have occurred since the last autovacuum. But it's possible that many of those were HOT updates, in which case the number of index entries to be cleaned up might be much less than 20% of the table size. Alternatively, it's possible that we'd be better off vacuuming the table more often (say, autovacuum_vacuum_scale_factor=0.10 or 0.08 or something) but only doing the index scans every once in a while when enough dead line pointers have accumulated. After all, it's the first heap pass that frees up most of the space; cleaning dead line pointers seems a bit less urgent. But not having done any real analysis of how this would work out in practice, I'm not sure whether it's a good idea or not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: