Re: Making all nbtree entries unique by having heap TIDs participatein comparisons - Mailing list pgsql-hackers
From | Andrey Lepikhov |
---|---|
Subject | Re: Making all nbtree entries unique by having heap TIDs participatein comparisons |
Date | |
Msg-id | f6ccee8c-8f23-a3a6-16be-76349801472b@postgrespro.ru Whole thread Raw |
In response to | Re: Making all nbtree entries unique by having heap TIDs participatein comparisons (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
|
List | pgsql-hackers |
I use the v5 version in quick vacuum strategy and in the heap&index cleaner (see new patches at corresponding thread a little bit later). It works fine and give quick vacuum 2-3% performance growup in comparison with version v3 at my 24-core test server. Note, that the interface of _bt_moveright() and _bt_binsrch() functions with combination of scankey, scantid and nextkey parameters is too semantic loaded. Everytime of code reading i spend time to remember, what this functions do exactly. May be it needed to rewrite comments. For example, _bt_moveright() comments may include phrase: nextkey=false: Traverse to the next suitable index page if the current page does not contain the value (scan key; scan tid). What do you think about submitting the patch to the next CF? 12.09.2018 23:11, Peter Geoghegan пишет: > Attached is v4. I have two goals in mind for this revision, goals that > are of great significance to the project as a whole: > > * Making better choices around leaf page split points, in order to > maximize suffix truncation and thereby maximize fan-out. This is > important when there are mostly-distinct index tuples on each leaf > page (i.e. most of the time). Maximizing the effectiveness of suffix > truncation needs to be weighed against the existing/main > consideration: evenly distributing space among each half of a page > split. This is tricky. > > * Not regressing the logic that lets us pack leaf pages full when > there are a great many logical duplicates. That is, I still want to > get the behavior I described on the '"Write amplification" is made > worse by "getting tired" while inserting into nbtree secondary > indexes' thread [1]. This is not something that happens as a > consequence of thinking about suffix truncation specifically, and > seems like a fairly distinct thing to me. It's actually a bit similar > to the rightmost 90/10 page split case. > > v4 adds significant new logic to make us do better on the first goal, > without hurting the second goal. It's easy to regress one while > focussing on the other, so I've leaned on a custom test suite > throughout development. Previous versions mostly got the first goal > wrong, but got the second goal right. For the time being, I'm > focussing on index size, on the assumption that I'll be able to > demonstrate a nice improvement in throughput or latency later. I can > get the main TPC-C order_line pkey about 7% smaller after an initial > bulk load with the new logic added to get the first goal (note that > the benefits with a fresh CREATE INDEX are close to zero). The index > is significantly smaller, even though the internal page index tuples > can themselves never be any smaller due to alignment -- this is all > about not restricting what can go on each leaf page by too much. 7% is > not as dramatic as the "get tired" case, which saw something like a > 50% decrease in bloat for one pathological case, but it's still > clearly well worth having. The order_line primary key is the largest > TPC-C index, and I'm merely doing a standard bulk load to get this 7% > shrinkage. The TPC-C order_line primary key happens to be kind of > adversarial or pathological to B-Tree space management in general, but > it's still fairly realistic. > > For the first goal, page splits now weigh what I've called the > "distance" between tuples, with a view to getting the most > discriminating split point -- the leaf split point that maximizes the > effectiveness of suffix truncation, within a range of acceptable split > points (acceptable from the point of view of not implying a lopsided > page split). This is based on probing IndexTuple contents naively when > deciding on a split point, without regard for the underlying > opclass/types. We mostly just use char integer comparisons to probe, > on the assumption that that's a good enough proxy for using real > insertion scankey comparisons (only actual truncation goes to those > lengths, since that's a strict matter of correctness). This distance > business might be considered a bit iffy by some, so I want to get > early feedback. This new "distance" code clearly needs more work, but > I felt that I'd gone too long without posting a new version. > > For the second goal, I've added a new macro that can be enabled for > debugging purposes. This has the implementation sort heap TIDs in ASC > order, rather than DESC order. This nicely demonstrates how my two > goals for v4 are fairly independent; uncommenting "#define > BTREE_ASC_HEAP_TID" will cause a huge regression with cases where many > duplicates are inserted, but won't regress things like the TPC-C > indexes. (Note that BTREE_ASC_HEAP_TID will break the regression > tests, though in a benign way can safely be ignored.) > > Open items: > > * Do more traditional benchmarking. > > * Add pg_upgrade support. > > * Simplify _bt_findsplitloc() logic. > > [1] https://postgr.es/m/CAH2-Wzmf0fvVhU+SSZpGW4Qe9t--j_DmXdX3it5JcdB8FF2EsA@mail.gmail.com > -- Andrey Lepikhov Postgres Professional https://postgrespro.com The Russian Postgres Company
pgsql-hackers by date: