Re: Stating the significance of Lehman & Yao in the nbtree README - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Stating the significance of Lehman & Yao in the nbtree README |
Date | |
Msg-id | 540EA5CB.8070601@vmware.com Whole thread Raw |
In response to | Re: Stating the significance of Lehman & Yao in the nbtree README (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Stating the significance of Lehman & Yao in the nbtree README
|
List | pgsql-hackers |
On 09/07/2014 05:58 AM, Peter Geoghegan wrote: > + Lehman and Yao don't require read locks, but assume that in-memory > + copies of tree pages are unshared. Postgres shares in-memory buffers > + among backends. As a result, we do page-level read locking on btree > + pages in order to guarantee that no record is modified while we are > + examining it. This reduces concurrency but guarantees correct > + behavior. An advantage is that when trading in a read lock for a > + write lock, we need not re-read the page after getting the write lock. > + Since we're also holding a pin on the shared buffer containing the > + page, we know that buffer still contains the page and is up-to-date. This is the existing paragraph, just moved to different place in the README. > + Although it could be argued that Lehman and Yao isn't followed to the > + letter because single pages are read locked as the tree is descended, > + this is at least necessary to support deletion, a common requirement > + which L&Y hardly acknowledge. Read locks also ensure that B-tree > + pages are self-consistent (L&Y appear to assume atomic page reads and > + writes). This is just duplicating the existing paragraph. I don't see the point of this. > Even with these read locks, following L&Y obviates the need > + of earlier schemes to hold multiple locks concurrently when descending > + the tree as part of servicing index scans (pessimistic lock coupling). > + The addition of right-links at all levels, as well as the addition of > + a page "high key" allows detection and dynamic recovery from > + concurrent page splits (that is, splits between unlocking an internal > + page, and subsequently locking its child page during a descent). When > + a page is first locked (at every level of a descent servicing an index > + scan), we consider the need to "move right": if the scankey value is > + less than (or sometimes less than or equal to) the page's existing > + highkey value, a value which serves as an upper bound for values on > + the page generally, then it must be necessary to move the scan to the > + right-hand page on the same level. It's even possible that the scan > + needs to move right more than once. Once the other session's > + concurrent page split finishes, a downlink will be inserted into the > + parent, and so assuming there are no further page splits, future index > + scans using the same scankey value will not need to move right. L&Y > + Trees are sometimes referred to as "B-Link trees" in the literature. This explains in a few sentences what a L&Y B-tree looks like. The current README assumes that you're already familiar with what a L&Y tree looks like, or that you go read the paper mentioned at the top of the README. It might be a good idea to expand on that, and add an introduction like this so that an unfamiliar reader doesn't need to read the L&Y paper first. Is that the purpose of this patch? Please make it more explicit. And please make the sentences simpler - the above reads like a Shakespeare play. Something like: The basic Lehman & Yao Algorithm ================================ Compared to a classic B-tree, L&Y adds a right-link pointer to each page, to the page's right sibling. It also adds a "high key" to each page, which is an upper bound on the keys that are allowed on that page. These two additions make it possible detect a concurrent page split, which allows the tree to be searched without holding any read locks (except to keep a single page from being modified while reading it). When a search follows a downlink to a child page, it compares the page's high key with the search key. If the search key is greater than the high key, the page must've been split concurrently, and you must follow the right-link to find the new page containing the key range you're looking for. This might need to be repeated, if the page has been split more than once. Differences to the Lehman & Yao algorithm ========================================= (current "Lehamn and Yao Algorithm and Insertions section) I think that's pretty much the same information you added, but it's in a separate section, with the clear purpose that it explains what a L&Y tree looks like. You can skip over it, if you have read the paper or are otherwise already familiar with it. It still assumes that you're familiar with B-trees in general. Anyway, I see that you had resurrected this in the commitfest app after three weeks of inactivity. I'm going to mark this back to "Returned with Feedback". Please don't resurrect it again, this patch has received more than its fair share of attention. Instead, please help by signing up to review a patch. The commitfest progress is glacial at the moment, and we really need experienced reviewers like you to get closure to people's patches. - Heikki
pgsql-hackers by date: