Re: Connections hang indefinitely while taking a gin index's LWLockbuffer_content lock(PG10.7) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Connections hang indefinitely while taking a gin index's LWLockbuffer_content lock(PG10.7) |
Date | |
Msg-id | CAH2-Wz=qi7P40nWZ8J5GSbE=1DrY=1QP2xvGPdWAaePD=DfAfQ@mail.gmail.com Whole thread Raw |
In response to | Re: Connections hang indefinitely while taking a gin index's LWLockbuffer_content lock(PG10.7) (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
Responses |
Re: Connections hang indefinitely while taking a gin index's LWLockbuffer_content lock(PG10.7)
|
List | pgsql-hackers |
On Sun, Oct 27, 2019 at 12:04 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > (0001-Fix-deadlock-between-ginDeletePage-and-ginStepRigh-3.patch) Some thoughts on the first patch: * How do comparisons of items in each type of B-Tree work? I would like to see a statement about invariants similar to nbtree's "subtree" invariant. Looks like the "high key" is <= (not <) in each case (i.e. for both entry trees and posting trees), just like nbtree. nbtree has an explicit high key rather than using the current highest data item on the page (maybe this can be called a "pseudo high key" or something). That is one important difference, but in general GIN seems to copy nbtree. I think. However, GIN never explains stuff that must be affected by invariants in binary search routines like entryLocateLeafEntry() and dataLocateItem(). GIN never makes the similarities and differences clear. Maybe you could do more on that -- it's too hard to tell why entryLocateLeafEntry() and entryLocateEntry() (i.e. leaf and internal page binary search variants for entry B-Trees) do things differently to each other (and do things differently to nbtree's binary search routine). This code from entryLocateEntry() is a good example of how this can be confusing: if (result == 0) { stack->off = mid; Assert(GinGetDownlink(itup) != GIN_ROOT_BLKNO); return GinGetDownlink(itup); } else if (result > 0) low = mid + 1; else high = mid; Some questions about this code: * Why is the "if (result == 0)" branch using the block number/downlink from the "mid" tuple as its return value? Why not follow nbtree's _bt_binsrch() here too, by returning "the last key < scan key" on internal pages? If your high key from one level down is <=, and also a pseudo high key (i.e. both a "real" entry and a high key), how can it be correct to go to the block/downlink from an equal "mid" tuple like this? Won't that make index scans miss the pseudo high key, which they have to return to the scan? * Why is it okay that there is no "negative infinity" item in internal pages for the entry tree? GIN does not have to copy nbtree in every detail to be correct, but it confuses me that it *mostly* copies nbtree, but doesn't do so *fully*. As I wrote just now (or at least implied), entryIsMoveRight() works in a similar way to _bt_moveright(), and yet we still have these apparent differences with how binary searches work that seems to not be compatible with that. Maybe this is okay, but I cannot understand why that is right now. It looks wrong to me -- so wrong that I suppose I must be mistaken. I also spotted some fairly minor issues: * s/rightest/rightmost/ * It's weird that GinBtree/GinBtreeData is a set of callbacks for both posting trees and main entry trees, since the rules are now quite different in each case. Not sure if it's worth saying something about that. * "Exclusive lock" makes me think of "ExclusiveLock", which is a kind of heavyweight lock (not a buffer lock). I suggest changing that wording to avoid confusion. In general, it seems very important to be clear about exactly how the key space works. The nbtree work for v12 greatly benefitted from defining comparisons in a way that didn't really change how nbtree worked, while at the same time minimizing I/O and making nbtree faithful to Lehman & Yao's original design. It isn't obvious how valuable it is to really carefully define how invariants and key comparisons work, but it seems possible to solve a lot of problems that way. -- Peter Geoghegan
pgsql-hackers by date: