Re: pgsql: Optimize btree insertions for common case of increasing values - Mailing list pgsql-committers
From | Peter Geoghegan |
---|---|
Subject | Re: pgsql: Optimize btree insertions for common case of increasing values |
Date | |
Msg-id | CAH2-Wzm_i6TkxppTsT7ZwkGBfaa9fgZrFaW99909-+qYy2BFUA@mail.gmail.com Whole thread Raw |
In response to | Re: pgsql: Optimize btree insertions for common case of increasing values (Pavan Deolasee <pavan.deolasee@gmail.com>) |
Responses |
Re: pgsql: Optimize btree insertions for common case of increasing values
|
List | pgsql-committers |
On Wed, Apr 4, 2018 at 5:33 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > Ok. Adding a check for tree height and setting target block only if it's >= > 2, as suggested by you and Simon later. Rahila helped me also ran another > round of tests and this does not lead to any performance regression (we were > worried about whether calling _bt_getrootheight will be expensive). Right. > Also moved RelationSetTargetBlock() to a point where we are not holding any > locks and are outside the critical section. Right. >> * An assertion would make me feel a lot better about depending on not >> having a page split from a significant distance. > > > Ok. I assume you mean an assertion to check that the target page doesn't > have an in-complete split. Added that though not sure if it's useful since > we concluded that right-most page can't have in-complete split. > > Let me know if you mean something else. I meant something else. I was talking about the assertion discussed below. I don't see too much point in the !P_INCOMPLETE_SPLIT(lpageop) assertion, though. >> Your optimization should continue to not be used when it would result >> in a page split, but only because that would be slow. The comments >> should say so, IMV. > > > Added. I think the wording here could use some tweaking: > /* > - * Check if the page is still the rightmost leaf page, has enough > - * free space to accommodate the new tuple, no split is in progress > - * and the scankey is greater than or equal to the first key on the > - * page. > + * Check if the page is still the rightmost valid leaf page, has > + * enough free space to accommodate the new tuple and the scankey > + * is strictly greater than the first key on the page. > + * > + * NB: We could take the fastpath even when the target block > + * doesn't have enough free space (but it's the right-most block) > + * since _bt_insertonpg() is capable of working with a NULL stack > + * and that's the only additional thing the slow path sets up. But > + * we don't optimise for that case because while spliting and > + * inserting into the parent without the stack is relatively slow > + * operation. > */ I would cut this down, and just say "We only insert if it definitely won't result in a pagesplit" -- no need for the second paragraph in this high-level routine. The full details can go on top of the new _bt_insertonpg() assertion I talk about later. > You mean passing "fastpath" to _bt_insertonpg and then checking it's false > if page split is needed? But isn't page split is only needed if the page > doesn't have enough free space? If so, we have checked for that before > setting "fastpath". That's not exactly what I meant. I meant that if: 1. This is an insertion to the leaf level in _bt_insertonpg(). and 2. We don't have space on the page, and so must do a split (or at least free LP_DEAD items). and 3. RelationGetTargetBlock(rel) != InvalidBlockNumber There should be an assertion failure. This new assertion within _bt_insertonpg() makes it much less likely that the assumption breaks later. This is where you could point out the low-level details that I suggested be omitted from _bt_doinsert() at the beginning of this e-mail. You can mention here that it would actually work without a pagesplit, but that is only intended for crash recovery, and is a much slower path that would make the optimization totally counter-productive. We add an assertion because without one it would be easy to miss a regression where there is a page split with an empty stack. Finally, I'd like to see a small paragraph in the nbtree README, about the high level theory behind this optimization and page recycling. The assumption is that there can only be one non-ignorable leaf rightmost page, and so even a RecentGlobalXmin style interlock isn't required. We cannot fail to detect that our hint was invalidated, because there can only be one such page in the B-Tree at any time. It's possible that the page will be deleted and recycled without a backend's cached page also being detected as invalidated, but only when we happen to recycle a page that once again becomes the rightmost leaf page. Once those changes are made, this should be fine to commit. -- Peter Geoghegan
pgsql-committers by date: