Re: Proposal: Another attempt at vacuum improvements - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Proposal: Another attempt at vacuum improvements |
Date | |
Msg-id | BANLkTiknzuwNaRwoHxFVgp1M2T=4Rbm6hA@mail.gmail.com Whole thread Raw |
In response to | Proposal: Another attempt at vacuum improvements (Pavan Deolasee <pavan.deolasee@gmail.com>) |
Responses |
Re: Proposal: Another attempt at vacuum improvements
|
List | pgsql-hackers |
So, first of all, thanks for putting some effort and thought into this. Despite the large number of improvements in this area in 8.3 and 8.4, this is still a pain point, and it would be really nice to find a way to make some further improvements. On Tue, May 24, 2011 at 2:58 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > So the idea is to separate the index vacuum (removing index pointers to dead > tuples) from the heap vacuum. When we do heap vacuum (either by HOT-pruning > or using regular vacuum), we can spool the dead line pointers somewhere. To > avoid any hot-spots during normal processing, the spooling can be done > periodically like the stats collection. What happens if the system crashes after a line pointer becomes dead but before the record of its death is safely on disk? The fact that a previous index vacuum has committed is only sufficient justification for reclaiming the dead line pointers if you're positive that the index vacuum killed the index pointers for *every* dead line pointer. I'm not sure we want to go there; any operation that wants to make a line pointer dead will need to be XLOG'd. Instead, I think we should stick with your original idea and just try to avoid the second heap pass. So to do that, as you say, we can have every operation that creates a dead line pointer note the LSN of the operation in the page. But instead of allocating permanent space in the page header, which would both reduce (admittedly only by 8 bytes) the amount of space available for tuples, and more significantly have the effect of breaking on-disk compatibility, I'm wondering if we could get by with making space for that extra LSN only when it's actually present. In other words, when it's present, we set a bit PD_HAS_DEAD_LINE_PTR_LSN or somesuch, increment pd_upper, and use the extra space to store the LSN. There is an alignment problem to worry about there but that shouldn't be a huge issue. When we vacuum, we remember the LSN before we start. When we finish, if we scanned the indexes and everything completed without error, then we bump the heap's notion (wherever we store it) of the last successful index vacuum. When we vacuum or do HOT cleanup on a page, if the page has a most-recent-dead-line pointer LSN and it precedes the start-of-last-successful-index-vacuum LSN, then we mark all the LP_DEAD tuples as LP_UNUSED and throw away the most-recent-dead-line-pointer LSN. One downside of this approach is that, if we do something like this, it'll become slightly more complicated to figure out where the item pointer array ends. Another issue is that we might find ourselves wanting to extend the item pointer array to add a new item, and unable to do so easily because this most-recent-dead-line-pointer LSN is in the way. If the LSN stored in the page precedes the start-of-last-successful-index-vacuum LSN, and if, further, we can get a buffer cleanup lock on the page, then we can do a HOT cleanup and life is good. Otherwise, we can either (1) just forget about the most-recent-dead-line-pointer LSN - not ideal but not catastrophic either - or (2) if the start-of-last-successful-vacuum-LSN is old enough, we could overwrite an LP_DEAD line pointer in place. Another issue is that this causes problems for temporary and unlogged tables, because no WAL records are generated and, therefore, the LSN does not advance. This is also a problem for GIST indexes; Heikki fixed temporary GIST indexes by generating fake LSNs off of a backend-local counter. Unlogged GIST indexes are currently not supported. I think what we need to do is create an API to which you can pass a relation and get an LSN. If it's a permanent relation, you get a regular LSN. If it's a temporary relation, you get a fake LSN based on a backend-local counter. If it's an unlogged relation, you get a fake LSN based on a shared-memory counter that is reset on restart. If we can encapsulate that properly, it should provide both what we need to make this idea work and allow a somewhat graceful fix for GIST-vs-unlogged problem. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: