Avoiding second heap scan in VACUUM - Mailing list pgsql-hackers
From | Pavan Deolasee |
---|---|
Subject | Avoiding second heap scan in VACUUM |
Date | |
Msg-id | 2e78013d0805280426o638ab05cq74d8e719f804b6bb@mail.gmail.com Whole thread Raw |
Responses |
Re: Avoiding second heap scan in VACUUM
Re: Avoiding second heap scan in VACUUM |
List | pgsql-hackers |
Tom brought this up during the PGCon developer meet. After thinking a bit about it, I think it's actually possible to avoid the second heap scan, especially now that we've HOT. If we can remove the second pass, not only would that speed up vacuum, but also reduce lots of redundant read and write IO. Currently second heap scan is required to remove the dead tuples from the heap. We can not do this in the first scan because we haven't yet removed the index pointers pointing to them. HOT now prunes and defrags the pages in the first phase itself and what is left behind is just a bunch of DEAD line pointers. The line pointers are marked "UNUSED" in the second heap scan. Since we don't repair any line pointer bloat, no additional free space is created in the second pass. So frankly there is not much left to be done in the second phase. Of course we also update the FSM information at the end of second pass. If we want to remove the second pass, what we need is a mechanism to reclaim the DEAD line pointers. But to this correctly, we must ensure that the DEAD line pointers are reclaimed only and only after the index entries pointing to them are removed. Tom's idea was to store the vacuum-xid in the tuple header and check that xid to see if the vacuum successfully removed the index pointers or not. Heikki had some brilliant idea to store the xid in the line pointer itself. These ideas are good, but would require xid wraparound handling. I am thinking of a solution on the following lines to handle DEAD line pointers. Other ideas are welcome too. 1. Before VACUUM starts, it updates the pg_class row of the target table, noting that VACUUM_IN_PROGRESS for the target table. 2. It then waits for all the existing transactions to finish to make sure that everyone can see the change in the pg_class row, 3. It then scans the heap, prunes and defrags the pages. The normal pruning would reclaim all the dead tuples and mark their line pointers as DEAD. Since VACUUM is going to remove the index pointers pointing to these DEAD line pointers, it now marks these DEAD line pointers with additional flag, say DEAD_RECLAIMED. 4. At the end of first scan, VACUUM updates FSM information for heap pages. 5. It then proceeds with the index scan and removes index pointers pointing to the DEAD line pointers collected in the heap scan. 6. Finally, it again updates the pg_class row and clears the VACUUM_IN_PROGRESS flag. Any other backend, when invokes page pruning, would check if the VACUUM is in progress by looking at the VACUUM_IN_PROGRESS flag. Note that if the previous vacuum had failed or database crashed before vacuum completed, the VACUUM_IN_PROGRESS flag would remain set until the next vacuum successfully completes on the table and resets the flag (VACUUM_NOT_IN_PROGRESS state). Since vacuum waits for the existing transactions to finish before marking any DEAD line pointers DEAD_RECLAIMED, for a backend which sees VACUUM_NOT_IN_PROGRESS, any DEAD_RECLAIMED line pointer it finds must be left over from the previously successfully completed vacuum. Since the previous vacuum must have removed the index pointers pointing to it, the backend can now safely reclaim the line pointer itself. The backend can potentially do this any time it sees a DEAD_RECLAIMED line pointer, but we may restrict this only during the pruning activity to keep things simple. This operation need not be WAL logged if we appropriately handle DEAD_RECLAIMED line pointer during redo recovery (if it's reused for some other insert/update activity). I think this scheme guarantees that a backend would always see VACUUM_IN_PROGRESS if vacuum is currently in progress on the table or the last vacuum has failed. There might be situations when a backend sees VACUUM_IN_PROGRESS when if fact there is no vacuum is progress and the last vacuum finished successfully, but that won't have any correctness implication, but would only delay reclaiming DEAD_RECLAIMED line pointers. Comments ? Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: