Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: vacuum, performance, and MVCC |
Date | |
Msg-id | 200606241742.k5OHg0b07414@momjian.us Whole thread Raw |
In response to | vacuum, performance, and MVCC ("Mark Woodward" <pgsql@mohawksoft.com>) |
Responses |
Re: vacuum, performance, and MVCC
|
List | pgsql-hackers |
bruce wrote: > Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > > > I think at some point we have to admit that _polling_ the tables, which > > > is what autovacuum does, just isn't going to work well, no matter how > > > much it is tweeked, and another approach should be considered for > > > certain workload cases. > > > > Autovacuum polls in its current, first-generation implementation; > > what I said upthread was it needs to be smarter than that. I am not > > sure how you get from that to the conclusion that the very next step > > is to abandon the vacuuming approach altogether. > > I am not ready to abandon autovacuum, but as I stated later the UPDATE > with no key change case is common enought that it could be handled > better without involving autovacuum and its limitations. > > As I remember, most databases have problem with DELETE/INSERT cycles, > but we seem to be hit by UPDATE performance more than most, and more > than is wise. In an attempt to get some concrete on these ideas... ;-) I think the major complexity in doing an in-place UPDATE when no key columns change is allowing rollback on transaction abort (or backend crash), and properly handling visibility for transactions in progress. If the old and new rows are on the same heap page (perhaps a necessary limitation), you can easily update the heap item id to point to the new heap row. All indexes will then point to the new row, and sequential scans will still see both rows (which is good). That leave the rollback issue (undoing the item id change), and having index scans for current backends still see the old row. OK, I have an idea. Right now, an UPDATE where the old and new rows are on the same page have two index entries. What if we made only one index entry for both? We already have UPDATE chaining, where the old row points to the new one. If no key columns change, we set a bit in the heap that the chaining points to the old and new row (both on the same page), so an index scan uses one index entry to see the old and new row, and once the old row is no longer visible, the page index id can be updated to point to the new row and the old row can be marked free and perhaps used for a future UPDATE. (UPDATE already tries to do keep updates on the same heap page.) FYI, the reason heap cleanup is possible once you go with a single index entry for old and new rows is because there is no index cleanup (and single-row index cleanup is very expensive). -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: