Home > mailing lists

Re: Stating the significance of Lehman & Yao in the nbtree README - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Stating the significance of Lehman & Yao in the nbtree README
Date	September 9, 2014 07:02:02
Msg-id	540EA5CB.8070601@vmware.com Whole thread Raw
In response to	Re: Stating the significance of Lehman & Yao in the nbtree README (Peter Geoghegan <pg@heroku.com>)
Responses	Re: Stating the significance of Lehman & Yao in the nbtree README
List	pgsql-hackers

Tree view

On 09/07/2014 05:58 AM, Peter Geoghegan wrote:
> + Lehman and Yao don't require read locks, but assume that in-memory
> + copies of tree pages are unshared.  Postgres shares in-memory buffers
> + among backends.  As a result, we do page-level read locking on btree
> + pages in order to guarantee that no record is modified while we are
> + examining it.  This reduces concurrency but guarantees correct
> + behavior.  An advantage is that when trading in a read lock for a
> + write lock, we need not re-read the page after getting the write lock.
> + Since we're also holding a pin on the shared buffer containing the
> + page, we know that buffer still contains the page and is up-to-date.

This is the existing paragraph, just moved to different place in the README.

> + Although it could be argued that Lehman and Yao isn't followed to the
> + letter because single pages are read locked as the tree is descended,
> + this is at least necessary to support deletion, a common requirement
> + which L&Y hardly acknowledge.  Read locks also ensure that B-tree
> + pages are self-consistent (L&Y appear to assume atomic page reads and
> + writes).

This is just duplicating the existing paragraph. I don't see the point 
of this.

>  Even with these read locks, following L&Y obviates the need
> + of earlier schemes to hold multiple locks concurrently when descending
> + the tree as part of servicing index scans (pessimistic lock coupling).
> + The addition of right-links at all levels, as well as the addition of
> + a page "high key" allows detection and dynamic recovery from
> + concurrent page splits (that is, splits between unlocking an internal
> + page, and subsequently locking its child page during a descent).  When
> + a page is first locked (at every level of a descent servicing an index
> + scan), we consider the need to "move right":  if the scankey value is
> + less than (or sometimes less than or equal to) the page's existing
> + highkey value, a value which serves as an upper bound for values on
> + the page generally, then it must be necessary to move the scan to the
> + right-hand page on the same level.  It's even possible that the scan
> + needs to move right more than once.  Once the other session's
> + concurrent page split finishes, a downlink will be inserted into the
> + parent, and so assuming there are no further page splits, future index
> + scans using the same scankey value will not need to move right.  L&Y
> + Trees are sometimes referred to as "B-Link trees" in the literature.

This explains in a few sentences what a L&Y B-tree looks like. The 
current README assumes that you're already familiar with what a L&Y tree 
looks like, or that you go read the paper mentioned at the top of the 
README. It might be a good idea to expand on that, and add an 
introduction like this so that an unfamiliar reader doesn't need to read 
the L&Y paper first. Is that the purpose of this patch? Please make it 
more explicit. And please make the sentences simpler - the above reads 
like a Shakespeare play. Something like:

The basic Lehman & Yao Algorithm
================================

Compared to a classic B-tree, L&Y adds a right-link pointer to each 
page, to the page's right sibling. It also adds a "high key" to each 
page, which is an upper bound on the keys that are allowed on that page. 
These two additions make it possible detect a concurrent page split, 
which allows the tree to be searched without holding any read locks 
(except to keep a single page from being modified while reading it).

When a search follows a downlink to a child page, it compares the page's 
high key with the search key. If the search key is greater than the high 
key, the page must've been split concurrently, and you must follow the 
right-link to find the new page containing the key range you're looking 
for. This might need to be repeated, if the page has been split more 
than once.

Differences to the Lehman & Yao algorithm
=========================================

(current "Lehamn and Yao Algorithm and Insertions section)

I think that's pretty much the same information you added, but it's in a 
separate section, with the clear purpose that it explains what a L&Y 
tree looks like. You can skip over it, if you have read the paper or are 
otherwise already familiar with it. It still assumes that you're 
familiar with B-trees in general.

Anyway, I see that you had resurrected this in the commitfest app after 
three weeks of inactivity. I'm going to mark this back to "Returned with 
Feedback". Please don't resurrect it again, this patch has received more 
than its fair share of attention. Instead, please help by signing up to 
review a patch. The commitfest progress is glacial at the moment, and we 
really need experienced reviewers like you to get closure to people's 
patches.

- Heikki

pgsql-hackers by date:

From: Craig Ringer
Date: 09 September 2014, 05:56:07
Subject: Re: proposal: plpgsql - Assert statement

From: Marko Tiikkaja
Date: 09 September 2014, 07:49:58
Subject: Re: proposal: plpgsql - Assert statement

Re: Stating the significance of Lehman & Yao in the nbtree README - Mailing list pgsql-hackers

Previous

Next