Home > mailing lists

Re: Making all nbtree entries unique by having heap TIDs participatein comparisons - Mailing list pgsql-hackers

From	Andrey Lepikhov
Subject	Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Date	September 20, 2018 07:56:37
Msg-id	f6ccee8c-8f23-a3a6-16be-76349801472b@postgrespro.ru Whole thread Raw
In response to	Re: Making all nbtree entries unique by having heap TIDs participatein comparisons (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
List	pgsql-hackers

Tree view

I use the v5 version in quick vacuum strategy and in the heap&index 
cleaner (see new patches at corresponding thread a little bit later). It 
works fine and give quick vacuum 2-3% performance growup in comparison 
with version v3 at my 24-core test server.
Note, that the interface of _bt_moveright() and _bt_binsrch() functions 
with combination of scankey, scantid and nextkey parameters is too 
semantic loaded.
Everytime of code reading i spend time to remember, what this functions 
do exactly.
May be it needed to rewrite comments. For example, _bt_moveright() 
comments may include phrase:
nextkey=false: Traverse to the next suitable index page if the current 
page does not contain the value (scan key; scan tid).

What do you think about submitting the patch to the next CF?

12.09.2018 23:11, Peter Geoghegan пишет:
> Attached is v4. I have two goals in mind for this revision, goals that
> are of great significance to the project as a whole:
> 
> * Making better choices around leaf page split points, in order to
> maximize suffix truncation and thereby maximize fan-out. This is
> important when there are mostly-distinct index tuples on each leaf
> page (i.e. most of the time). Maximizing the effectiveness of suffix
> truncation needs to be weighed against the existing/main
> consideration: evenly distributing space among each half of a page
> split. This is tricky.
> 
> * Not regressing the logic that lets us pack leaf pages full when
> there are a great many logical duplicates. That is, I still want to
> get the behavior I described on the '"Write amplification" is made
> worse by "getting tired" while inserting into nbtree secondary
> indexes' thread [1]. This is not something that happens as a
> consequence of thinking about suffix truncation specifically, and
> seems like a fairly distinct thing to me. It's actually a bit similar
> to the rightmost 90/10 page split case.
> 
> v4 adds significant new logic to make us do better on the first goal,
> without hurting the second goal. It's easy to regress one while
> focussing on the other, so I've leaned on a custom test suite
> throughout development. Previous versions mostly got the first goal
> wrong, but got the second goal right. For the time being, I'm
> focussing on index size, on the assumption that I'll be able to
> demonstrate a nice improvement in throughput or latency later. I can
> get the main TPC-C order_line pkey about 7% smaller after an initial
> bulk load with the new logic added to get the first goal (note that
> the benefits with a fresh CREATE INDEX are close to zero). The index
> is significantly smaller, even though the internal page index tuples
> can themselves never be any smaller due to alignment -- this is all
> about not restricting what can go on each leaf page by too much. 7% is
> not as dramatic as the "get tired" case, which saw something like a
> 50% decrease in bloat for one pathological case, but it's still
> clearly well worth having. The order_line primary key is the largest
> TPC-C index, and I'm merely doing a standard bulk load to get this 7%
> shrinkage. The TPC-C order_line primary key happens to be kind of
> adversarial or pathological to B-Tree space management in general, but
> it's still fairly realistic.
> 
> For the first goal, page splits now weigh what I've called the
> "distance" between tuples, with a view to getting the most
> discriminating split point -- the leaf split point that maximizes the
> effectiveness of suffix truncation, within a range of acceptable split
> points (acceptable from the point of view of not implying a lopsided
> page split). This is based on probing IndexTuple contents naively when
> deciding on a split point, without regard for the underlying
> opclass/types. We mostly just use char integer comparisons to probe,
> on the assumption that that's a good enough proxy for using real
> insertion scankey comparisons (only actual truncation goes to those
> lengths, since that's a strict matter of correctness). This distance
> business might be considered a bit iffy by some, so I want to get
> early feedback. This new "distance" code clearly needs more work, but
> I felt that I'd gone too long without posting a new version.
> 
> For the second goal, I've added a new macro that can be enabled for
> debugging purposes. This has the implementation sort heap TIDs in ASC
> order, rather than DESC order. This nicely demonstrates how my two
> goals for v4 are fairly independent; uncommenting "#define
> BTREE_ASC_HEAP_TID" will cause a huge regression with cases where many
> duplicates are inserted, but won't regress things like the TPC-C
> indexes. (Note that BTREE_ASC_HEAP_TID will break the regression
> tests, though in a benign way can safely be ignored.)
> 
> Open items:
> 
> * Do more traditional benchmarking.
> 
> * Add pg_upgrade support.
> 
> * Simplify _bt_findsplitloc() logic.
> 
> [1] https://postgr.es/m/CAH2-Wzmf0fvVhU+SSZpGW4Qe9t--j_DmXdX3it5JcdB8FF2EsA@mail.gmail.com
> 

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

From: Tom Lane
Date: 20 September 2018, 07:44:30
Subject: Re: vary read_only in SPI calls? or poke at the on-entry snapshot?

From: Haroon
Date: 20 September 2018, 08:13:48
Subject: Re: PostgreSQL 11 {Beta 4, RC1} Release: 2018-09-20

Re: Making all nbtree entries unique by having heap TIDs participatein comparisons - Mailing list pgsql-hackers

Previous

Next