Thread: Duplicate-key-detection failure case found in btree
I have found a case in nbtree where it is possible for two concurrent transactions to insert the same key value in a unique index :-( The reason is that _bt_check_unique can (of course) only detect keys that are already in the index. To prevent concurrent insertion of duplicate keys, we have to rely on locking. Two would-be inserters of the same key must lock the same target page in _bt_doinsert, so they cannot run the check at the same time. The second to obtain the lock will see the first one's already-inserted key. However, suppose that the index contains many existing instances of the same key (presumably pointing at deleted-but-not-yet-vacuumed tuples). If the existing instances span several index pages, it is likely that _bt_insertonpg will decide to insert the new key on one of the pages to the right of the original target key. As presently coded, _bt_insertonpg drops the write lock it has before acquiring write lock on the next page to the right. This creates a window wherein another process could perform _bt_check_unique, fail to find any non-dead instances of the key, and decide that it's okay to insert the same key. The fix is to acquire the next page's write lock before we drop the current page's. This is deadlock-free since no writer ever tries to chain write-locks to the left (in fact the same thing is done in the page split logic). I am not convinced that this explains any of the field reports of duplicate rows that we've seen, but it's clearly a bug. Will commit the fix shortly. regards, tom lane
> The fix is to acquire the next page's write lock before we drop the > current page's. This is deadlock-free since no writer ever tries > to chain write-locks to the left (in fact the same thing is done in > the page split logic). > > I am not convinced that this explains any of the field reports of > duplicate rows that we've seen, but it's clearly a bug. Will commit > the fix shortly. Sounds good to me. Good find. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > The fix is to acquire the next page's write lock before we drop the > > current page's. This is deadlock-free since no writer ever tries > > to chain write-locks to the left (in fact the same thing is done in > > the page split logic). > > > > I am not convinced that this explains any of the field reports of > > duplicate rows that we've seen, but it's clearly a bug. Will commit > > the fix shortly. > > Sounds good to me. Good find. It always scares me that bugs like this can live untriggered for multiple releases. What else is lingering under the surface ... And I absolutely agree with Tom, there is no connection between this bug and the duplicate rows reported. Good catch anyway. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck <janwieck@yahoo.com> writes: > It always scares me that bugs like this can live untriggered > for multiple releases. What else is lingering under the > surface ... FWIW, I don't think that this bug did survive for multiple releases (though it very nearly made it into 7.2). The present form of bt_check_unique and bt_insertonpg was new code in 7.1, replacing the BTP_CHAIN method of handling identical keys. That code had a lot of bugs of its own, but I don't think it had this one. I will take the blame for introducing this bug into 7.1 ... regards, tom lane