Re: _bt_split(), and the risk of OOM before its critical section - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: _bt_split(), and the risk of OOM before its critical section |
Date | |
Msg-id | CAH2-Wz=r2ae=mLvD-o5hjzcbARBXJSXDZcL_v1J=FCzCb0Vqaw@mail.gmail.com Whole thread Raw |
In response to | Re: _bt_split(), and the risk of OOM before its critical section (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: _bt_split(), and the risk of OOM before its critical section
Re: _bt_split(), and the risk of OOM before its critical section |
List | pgsql-hackers |
On Mon, May 6, 2019 at 3:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah, as _bt_split is currently coded, _bt_truncate has to be a "no > errors" function, which it isn't. The pfree for its result is being > done in an ill-chosen place, too. I am tempted to move the call to _bt_truncate() out of _bt_split() entirely on HEAD, possibly relocating it to nbtsplitloc.c/_bt_findsplitloc(). That way, there is a clearer separation between how split points are chosen, suffix truncation, and the mechanical process of executing a legal page split. > Another problem now that I look at it is that the _bt_getbuf for the right > sibling is probably not too safe. And the _bt_vacuum_cycleid() call seems > a bit dangerous from this standpoint as well. Yeah, we can tighten those up without much difficulty. > I'm not really concerned about that one because at that point the > right page is still in a freshly-pageinit'd state. It's perhaps > not quite as nice as having it be zeroes, but it won't look like > it has any interesting data. The important question is how VACUUM will recognize it. It's clearly not as bad as something that causes "failed to re-find parent key" errors, but I think that VACUUM might not be reclaiming it for the FSM (haven't checked). Note that _bt_unlink_halfdead_page() is perfectly happy to ignore the fact that the left sibling of a half-dead page has a rightlink that doesn't point back to the target. Because, uh, there might have been a concurrent page deletion, somehow. We have heard a lot about "failed to re-find parent key" errors from VACUUM before now because that is about the only strong cross-check that it does. (Not that I'm arguing that we need more of that.) > In any case, once we've started to fill the ropaque area, it would really > be better if we don't call anything that could throw errors. > > Maybe we should bite the bullet and use two temp pages, so that none > of the data ends up in the shared buffer arena until we reach the > critical section? The extra copying is slightly annoying, but > it certainly seems like enforcing this invariant over such a > long stretch of code is not very maintainable. While I think that the smarts we have around deciding a split point will probably improve in future releases, and that we'll probably make _bt_truncate() itself do more, the actual business of performing a split has no reason to change that I can think of. I would like to keep _bt_split() as simple as possible anyway -- it should only be copying tuples using simple primitives like the bufpage.c routines. Living with what we have now (not using a temp buffer for the right page) seems fine. -- Peter Geoghegan
pgsql-hackers by date: