Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers
From | Anastasia Lubennikova |
---|---|
Subject | Re: Deleting older versions in unique indexes to avoid page splits |
Date | |
Msg-id | a1d3c9d3-f070-0bd0-8de2-49184e21b35c@postgrespro.ru Whole thread Raw |
In response to | Re: Deleting older versions in unique indexes to avoid page splits (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Deleting older versions in unique indexes to avoid page splits
|
List | pgsql-hackers |
On 08.10.2020 02:48, Peter Geoghegan wrote:
The idea seems very promising, especially when extended to handle non-unique indexes too.On Tue, Jun 30, 2020 at 5:03 PM Peter Geoghegan <pg@bowt.ie> wrote:Attached is a POC patch that teaches nbtree to delete old duplicate versions from unique indexes. The optimization targets non-HOT duplicate version bloat. Although the patch is rather rough, it nevertheless manages to more or less eliminate a whole class of index bloat: Unique index bloat from non-HOT updates in workloads where no transaction lasts for more than a few seconds.I'm slightly surprised that this thread didn't generate more interest back in June. After all, maintaining the pristine initial state of (say) a primary key index even after many high throughput non-HOT updates (i.e. avoiding "logically unnecessary" page splits entirely) is quite appealing. It arguably goes some way towards addressing long held criticisms of our approach to MVCC. Especially if it can be generalized to all b-tree indexes -- the Uber blog post mentioned tables that have several indexes, which presumably means that there can be no HOT updates (the author of the blog post didn't seem to be aware of HOT at all).
That's exactly what I wanted to discuss after the first letter. If we could make (non)HOT-updates index specific, I think it could improve performance a lot.I've been trying to generalize my approach to work with all indexes. I think that I can find a strategy that is largely effective at preventing version churn page splits that take place with workloads that have many non-HOT updates, without any serious downsides for workloads that do not benefit. I want to get feedback on that now, since I expect that it will be controversial. Teaching indexes about how tuples are versioned or chaining tuples seems like a non-starter, so the trick will be to come up with something that behaves in approximately the same way as that in cases where it helps. The approach I have in mind is to pass down a hint from the executor to btinsert(), which lets nbtree know that the index tuple insert is in fact associated with a non-HOT update. This hint is only given when the update did not logically modify the columns that the index covers
Here is the maybe-controversial part: The algorithm initially assumes that all indexes more or less have the same properties as unique indexes from a versioning point of view, even though that's clearly not true. That is, it initially assumes that the only reason why there can be duplicates on any leaf page it looks at is because some previous transaction also did a non-HOT update that added a new, unchanged index tuple. The new algorithm only runs when the hint is passed down from the executor and when the only alternative is to split the page (or have a deduplication pass), so clearly there is some justification for this assumption -- it really is highly unlikely that this update (which is on the verge of splitting the page) just so happened to be the first such update that affected the page.
I think that this optimization can affect low cardinality indexes negatively, but it is hard to estimate impact without tests. Maybe it won't be a big deal, given that we attempt to eliminate old copies not very often and that low cardinality b-trees are already not very useful. Besides, we can always make this thing optional, so that users could tune it to their workload.To be blunt: It may be controversial that we're accessing multiple heap pages while holding an exclusive lock on a leaf page, in the hopes that we can avoid a page split, but without any certainty that it'll work out. Sometimes (maybe even most of the time), this assumption turns out to be mostly correct, and we benefit in the obvious way -- no "unnecessary" page splits for affected non-unique indexes. Other times it won't work out, usually because the duplicates are in fact logical duplicates from distinct logical rows.
I wonder, how this new feature will interact with physical replication? Replica may have quite different performance profile. For example, there can be long running queries, that now prevent vacuum from removing recently-dead rows. How will we handle same situation with this optimized deletion?
-- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: