Re: patch: fix race in SSI's CheckTargetForConflictsIn - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: patch: fix race in SSI's CheckTargetForConflictsIn |
Date | |
Msg-id | BANLkTik0J5n3XefoKuqSP1kRP2ZJALv2Sw@mail.gmail.com Whole thread Raw |
In response to | Re: patch: fix race in SSI's CheckTargetForConflictsIn ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Responses |
Re: patch: fix race in SSI's CheckTargetForConflictsIn
|
List | pgsql-hackers |
On Thu, May 5, 2011 at 1:43 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Dan Ports <drkp@csail.mit.edu> wrote: >> While running some benchmarks to test SSI performance, I found a >> race condition that's capable of causing a segfault. A patch is >> attached. >> >> The bug is in CheckTargetForConflictsIn, which scans the list of >> SIREAD locks on a lock target when it's modified. There's an >> optimization in there where the writing transaction will remove a >> SIREAD lock that it holds itself, because it's being replaced with >> a (stronger) write lock. To do that, it needs to drop its shared >> lwlocks and reacquire them in exclusive mode. The existing code >> deals with concurrent modifications in that interval by redoing >> checks. However, it misses the case where some other transaction >> removes all remaining locks on the target, and proceeds to remove >> the lock target itself. >> >> The attached patch fixes this by deferring the SIREAD lock removal >> until the end of the function. At that point, there isn't any need >> to worry about concurrent updates to the target's lock list. The >> resulting code is also simpler. > > I just want to confirm that this addresses a real (although very > narrow) race condition. It results from code used to implement a > valuable optimization described in Cahill's thesis: don't get or > keep an SIREAD lock on a row which has a write lock. The write lock > is stronger and will cause a write/write conflict with any > overlapping transactions which would care about a read/write > conflict. The pattern of reading a row and then updating or > deleting it is so common that this optimization does a lot to avoid > promotion of predicate locks to coarser granularity, and thereby > helps avoid false positives. > > While the optimization is valuable, the code used to implement it > was pretty horrid. (And that was me that wrote it.) It has already > been the cause of several other fixes since the main patch went in. > What Dan has done here is move the optimization out of the middle of > the loop which is doing the conflict detection, and in doing so has > reduced the number of lines of code needed, reduced the amount of > fiddling with LW locks, and all around made the code more robust and > more understandable. > > I've reviewed the patch and it looks good to me. Dan has beat up on > it with the same DBT-2 run which exposed the race condition without > seeing a problem. Although a much smaller patch could address the > immediate problem, I strongly feel that Dan has taken the right > approach by refactoring this bit to something fundamentally cleaner > and less fragile. Why does this HASH_FIND the applicable hash table entries and then HASH_REMOVE it as a separate step, instead of just HASH_REMOVE-ing it in one go? Doesn't this fail to release the locks if rmpredlock == NULL? I believe it's project style to test (rmpredlock != NULL) rather than just (rmpredlock). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: