*** a/src/backend/access/nbtree/README --- b/src/backend/access/nbtree/README *************** use a simplified version of the deletion *** 11,16 **** --- 11,36 ---- Shasha (V. Lanin and D. Shasha, A Symmetric Concurrent B-Tree Algorithm, Proceedings of 1986 Fall Joint Computer Conference, pp 380-389). + Lehman and Yao don't require read locks, but assume that in-memory + copies of tree pages are unshared. Postgres shares in-memory buffers + among backends. As a result, we do page-level read locking on btree + pages in order to guarantee that no record is modified while we are + examining it. This reduces concurrency but guarantees correct + behavior. An advantage is that when trading in a read lock for a + write lock, we need not re-read the page after getting the write lock. + Since we're also holding a pin on the shared buffer containing the + page, we know that buffer still contains the page and is up-to-date. + + Even with these read locks, Lehman and Yao's approach obviates the + need of earlier schemes to hold multiple read locks concurrently when + descending the tree as part of servicing index scans (pessimistic lock + coupling). The addition of right-links at all levels, as well as the + addition of a page "high key" allows detection of, and dynamic + recovery from concurrent page splits (that is, splits between + unlocking an internal page, and subsequently locking its child page + during a descent). L&Y Trees are sometimes referred to as "B-Link + trees" in the literature. + The Lehman and Yao Algorithm and Insertions ------------------------------------------- *************** to be inserted has a choice whether or n *** 42,57 **** key could go on either page. (Currently, we try to find a page where there is room for the new key without a split.) - Lehman and Yao don't require read locks, but assume that in-memory - copies of tree pages are unshared. Postgres shares in-memory buffers - among backends. As a result, we do page-level read locking on btree - pages in order to guarantee that no record is modified while we are - examining it. This reduces concurrency but guarantees correct - behavior. An advantage is that when trading in a read lock for a - write lock, we need not re-read the page after getting the write lock. - Since we're also holding a pin on the shared buffer containing the - page, we know that buffer still contains the page and is up-to-date. - We support the notion of an ordered "scan" of an index as well as insertions, deletions, and simple lookups. A scan in the forward direction is no problem, we just use the right-sibling pointers that --- 62,67 ----