Re: pgsql: Optimize btree insertions for common case of increasing values - Mailing list pgsql-committers
From | Peter Geoghegan |
---|---|
Subject | Re: pgsql: Optimize btree insertions for common case of increasing values |
Date | |
Msg-id | CAH2-Wzmh_sxhXYfqhpkD6t+C3K57TRDOMXKdG-NQ6qvjnuW37g@mail.gmail.com Whole thread Raw |
In response to | Re: pgsql: Optimize btree insertions for common case of increasing values (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: pgsql: Optimize btree insertions for common case of increasing values
Re: pgsql: Optimize btree insertions for common case of increasing values |
List | pgsql-committers |
On Wed, Mar 28, 2018 at 11:56 AM, Peter Geoghegan <pg@bowt.ie> wrote: >> Previously, we agreed that P_IGNORE() is required. So I assume no issues >> there. The other tests seem required too for us to conclusively decide to >> use the cached block. > > Actually, you're right that P_INCOMPLETE_SPLIT() is the only redundant > one. Another issue that I have with the main test of the RelationSetTargetBlock() page is that nobody explains *why* we check that we have enough space on the page to proceed. Why would it not be okay if there was a page split? Although it's subtle, I'm pretty confident that it actually would be okay from a correctness point of view to allow an insertion to go ahead that would result in a page split -- it's possible to pass a NULL stack to _bt_findinsertloc() and _bt_insertonpg() while still getting correct behavior. * In the case of _bt_findinsertloc(), it's okay to have a NULL stack because you'll never take the P_INCOMPLETE_SPLIT() path for a rightmost page, as already explained, so the stack cannot possibly be used (besides, even _bt_finish_split() can deal with a NULL stack). It is not just an accident that it works, because _bt_search() will sometimes return a NULL stack when writing -- it does so when there is only one non-meta page, which is both a root and a leaf page. Your optimization is a bit similar to that, in the sense that the cached/target block contains a leaf page that is rightmost (root pages are always rightmost, and are leaf pages in the case described). * In the case of _bt_insertonpg(), it's okay to have a NULL stack because there is a codepath that allows _bt_insert_parent() (which is the only thing in _bt_insertonpg() that touches the stack) to work without a stack during crash recovery. That path is quite a bit slower, though. Suggested next steps to deal with this: * A minor point: I don't think you should call RelationSetTargetBlock() when the page P_ISROOT(), which, as I mentioned, is a condition that can coexist with P_ISLEAF() with very small B-Trees. There can be no point in doing so. No need to add P_ISROOT() to the main "Is cached page stale?" test within _bt_doinsert(), though. * An assertion would make me feel a lot better about depending on not having a page split from a significant distance. Your optimization should continue to not be used when it would result in a page split, but only because that would be slow. The comments should say so, IMV. Also, _bt_insertonpg() should have an assertion against a page split actually occurring when the optimization was used, just in case. When !BufferIsValid(cbuf), we know that we're being called from _bt_doinsert() (see existing assertion at the top of _bt_insertonpg() as an example of this), so at the point where it's clear a page split is needed, we should assert that there is no target block that we must have been passed as the target page. * The indentation of the main _bt_doinsert() test does not follow project guidelines. Please tweak that, too. That's all I have for now. I might think of something else tomorrow, since I haven't spent that much time on this. It would be nice to get a second opinion on some of this, if at all possible. -- Peter Geoghegan
pgsql-committers by date: