Re: Faster inserts with mostly-monotonically increasing values - Mailing list pgsql-hackers
From | Claudio Freire |
---|---|
Subject | Re: Faster inserts with mostly-monotonically increasing values |
Date | |
Msg-id | CAGTBQpYFeJRYMO7VyB30yD=CTA3yBKWKCFBQeiC4BM_dYXuvog@mail.gmail.com Whole thread Raw |
In response to | Re: Faster inserts with mostly-monotonically increasing values (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Faster inserts with mostly-monotonically increasing values
Re: Faster inserts with mostly-monotonically increasing values |
List | pgsql-hackers |
On Mon, Mar 5, 2018 at 9:12 PM, Peter Geoghegan <pg@bowt.ie> wrote: > On Thu, Mar 1, 2018 at 3:18 PM, Claudio Freire <klaussfreire@gmail.com> wrote: >> The key there is strictly greater than the rightmost key. As such, it >> must precede the first page with an equal key, and since it's the >> rightmost page... there's no such key. But if there was, the check for >> uniqueness only needs the insert point to point to the first such >> page, and this guarantees it. > > This code assumes that it can use block number as a stable way of > finding the same point in the B-Tree over time, without any interlock. > That may actually work out in practice, since even if the page is > recycled there can only ever be one rightmost leaf page, but it seems > like a brittle approach to me. The page image may be recycled by the > FSM already, and we really shouldn't be depending on the page not > looking a particular way once that has happened. I guess that you can > say the same thing about using page LSN to determine cached block > staleness instead, but that still seems a lot safer. > > Basically, I'm worried that we could do something entirely > unpredictable when a page has actually been recycled, since we're > entirely opting out of the RecentGlobalXmin interlock business on the > assumption that we can be sure that recycling hasn't occurred in some > other way. Imagine what could happen if we ever teach the FSM to share > freespace among small relations, which seems quite possible to me. Correct me if I'm wrong, but there's lots of code in nbtree already that risks reading recycled pages for various reasons. Presumably, checking P_ISDELETED and P_ISHALFDEAD should be enough, which is what !P_IGNORE implies. >> You could allow for some slack, by comparing against the leftmost key >> instead. You just need to know whether the key fits in the page. Then, >> the insert can proceed as usual, doing the binsearch to find the >> proper insertion point, etc. > > The whole way that this patch opts out of using an exclusive buffer > lock on the "first page the value could be on" (the _bt_check_unique() > + _bt_findinsertloc() stuff) makes me uneasy. I know that that isn't > terribly concrete. Assuming the rightmost page is the first page the value could be on, it already does get an exclusive buffer lock. And it seems that assumption is correct in the patch as-is. In fact, the current patch checks for a much stronger situation than needed to apply this optimization, so it can even skip checking for conflicting concurrent transactions. With an exclusive lock on the buffer, and being the rightmost page, it means there can be no conflicting key since the check is that the scan key is strictly greater than the rightmost key (lets forget for a moment empty rightmost pages). Unless there can be a new rightmost page in construction somehow (which I don't believe it can happen, new rightmost pages would be created by splitting the rightmost page, and that would conflict with the exclusive lock), I don't see how this can fail. If one wanted to relax the condition as I proposed, that uniqueness check is necessary, so that's why I propose shortcircuiting the search only, and not the actual insertion on the page. I believe IMO it's more important to try and maximize the conditions under which the optimization can be applied.
pgsql-hackers by date: