Re: Making all nbtree entries unique by having heap TIDs participatein comparisons - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Making all nbtree entries unique by having heap TIDs participatein comparisons |
Date | |
Msg-id | CAH2-WzkpKeZJrXvR_p7VSY1b-s85E3gHyTbZQzR0BkJ5LrWF_A@mail.gmail.com Whole thread Raw |
In response to | Re: Making all nbtree entries unique by having heap TIDs participatein comparisons (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
|
List | pgsql-hackers |
On Sun, Sep 30, 2018 at 2:33 PM Peter Geoghegan <pg@bowt.ie> wrote: > > Now benchmarking is not clear. Possible performance degradation from TID > > ordering interfere with positive effects from the optimizations in > > non-trivial way. > > Is there any evidence of a regression in the last 2 versions? I did find a pretty clear regression, though only with writes to unique indexes. Attached is v6, which fixes the issue. More on that below. v6 also: * Adds a new-to-v6 "insert at new item's insertion point" optimization, which is broken out into its own commit. This *greatly* improves the index bloat situation with the TPC-C benchmark in particular, even before the benchmark starts (just with the initial bulk load). See the relevant commit message for full details, or a couple of my previous mails on this thread. I will provide my own TPC-C test data + test case to any reviewer that wants to see this for themselves. It shouldn't be hard to verify the improvement in raw index size with any TPC-C implementation, though. Please make an off-list request if you're interested. The raw dump is 1.8GB. The exact details of when this new optimization kick in and how it works are tentative. They should really be debated. Reviewers should try to think of edge cases in which my "heap TID adjacency" approach could make the optimization kick in when it shouldn't -- cases where it causes bloat rather than preventing it. I couldn't find any such regressions, but this code was written very recently. I should also look into using HammerDB to do a real TPC-C benchmark, and really put the patch to the test...anybody have experience with it? * Generally groups everything into a relatively manageable series of cumulative improvements, starting with the infrastructure required to physically truncate tuples correctly, without any of the smarts around selecting a split point. The base patch is useless on its own, since it's just necessary to have the split point selection smarts to see a consistent benefit. Reviewers shouldn't waste their time doing any real benchmarking with just the first patch applied. * Adds a lot of new information to the nbtree README, about the high-level thought process behind the design, including citing the classic paper that this patch was primarily inspired by. * Adds a new, dedicated insertion scan key struct -- BTScanInsert[Data]. This is passed around to a number of different routines (_btsearch(), _bt_binsrch(), _bt_compare(), etc). This was suggested by Andrey, and also requested by Peter Eisentraut. While this BTScanInsert work started out as straightforward refactoring, it actually led to my discovering and fixing the regression I mentioned. Previously, I passed a lower bound on a binary search to _bt_binsrch() within _bt_findinsertloc(). This wasn't nearly as effective as what the master branch does for unique indexes at the same point -- it usually manages to reuse a result from an earlier _bt_binsrch() as the offset for the new tuple, since it has no need to worry about the new tuple's position *among duplicates* on the page. In earlier versions of my patch, most of the work of a second binary search took place, despite being redundant and unnecessary. This happened for every new insertion into a non-unique index -- I could easily measure the problem with a simple serial test case. I can see no regression there against master now, though. My fix for the regression involves including some mutable state in the new BTScanInsert struct (within v6-0001-*patch), to explicitly remember and restore some internal details across two binary searches against the same leaf page. We now remember a useful lower *and* upper bound within bt_binsrch(), which is what is truly required to fix the regression. While there is still a second call to _bt_binsrch() within _bt_findinsertloc() for unique indexes, it will do no comparisons in the common case where there are no existing dead duplicate tuples in the unique index. This means that the number of _bt_compare() calls we get in this _bt_findinsertloc() unique index path is the same as the master branch in almost all cases (I instrumented the regression tests to make sure of this). I also think that having BTScanInsert will ease things around pg_upgrade support, something that remains an open item. Changes in this area seem to make everything clearer -- the signature of _bt_findinsertloc() seemed a bit jumbled to me. Aside: I think that this BTScanInsert mutable state idea could be pushed even further in the future. "Dynamic prefix truncation" could be implemented by taking a similar approach when descending composite indexes for an index scan (doesn't have to be a unique index). We can observe that earlier attributes must all be equal to our own scankey's values once we descend the tree and pass between a pair of pivot tuples where a common prefix (some number of leading attributes) is fully equal. It's safe to just not bother comparing these prefix attributes on lower levels, because we can reason about their values transitively; _bt_compare() can be told to always skip the first attribute or two during later/lower-in-the-tree binary searches. This idea will not be implemented for Postgres v12 by me, though. -- Peter Geoghegan
Attachment
- v6-0005-DEBUG-Add-pageinspect-instrumentation.patch
- v6-0006-DEBUG-Allow-nbtree-to-use-ASC-heap-TID-order.patch
- v6-0003-Add-split-at-new-tuple-page-split-optimization.patch
- v6-0004-DEBUG-Add-page-split-instrumentation.patch
- v6-0002-Weigh-suffix-truncation-when-choosing-a-split-poi.patch
- v6-0001-Make-nbtree-indexes-have-unique-keys-in-tuples.patch
pgsql-hackers by date: