Re: LWLock deadlock and gdb advice - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: LWLock deadlock and gdb advice |
Date | |
Msg-id | 55B90C0C.6020104@iki.fi Whole thread Raw |
In response to | Re: LWLock deadlock and gdb advice (Andres Freund <andres@anarazel.de>) |
Responses |
Re: LWLock deadlock and gdb advice
|
List | pgsql-hackers |
On 07/29/2015 04:10 PM, Andres Freund wrote: > On 2015-07-29 14:22:23 +0200, Andres Freund wrote: >> On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote: >>> Ah, ok, that should work, as long as you also re-check the variable's value >>> after queueing. Want to write the patch, or should I? >> >> I'll try. Shouldn't be too hard. > > What do you think about something roughly like the attached? Hmm. Imagine this: Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting on it. The lock holder releases the lock, and wakes up A. But before A wakes up and sees that the lock is free, another backend acquires the lock again. It runs LWLockAcquireWithVar to the point just before setting the variable's value. Now A wakes up, sees that the lock is still (or again) held, and that the variable's value still matches the old one, and goes back to sleep. The new lock holder won't wake it up until it updates the value again, or to releases the lock. I think that's OK given the current usage of this in WALInsertLocks, but it's scary. At the very least you need comments to explain that race condition. You didn't like the new LW_FLAG_VAR_SET flag and the API changes I proposed? That eliminates the race condition. Either way, there is a race condition that if the new lock holder sets the variable to the *same* value as before, the old waiter won't necessarily wake up even though the lock was free or had a different value in between. But that's easier to explain and understand than the fact that the value set by LWLockAcquireWithVar() might not be seen by a waiter, until you release the lock or update it again. Another idea is to have LWLockAcquire check the wait queue after setting the variable, and wake up anyone who's already queued up. You could just call LWLockUpdateVar() from LWLockAcquireCommon to set the variable. I would prefer the approach from my previous patch more, though. That would avoid having to acquire the spinlock in LWLockAcquire altogether, and I actually like the API change from that independently from any correctness issues. The changes in LWLockWaitForVar() and LWLockConflictsWithVar() seem OK in principle. You'll want to change LWLockConflictsWithVar() so that the spinlock is held only over the "value = *valptr" line, though; the other stuff just modifies local variables and don't need to be protected by the spinlock. Also, when you enter LWLockWaitForVar(), you're checking if the lock is held twice in a row; first at the top of the function, and again inside LWLockConflictsWithVar. You could just remove the "quick test" at the top. - Heikki
pgsql-hackers by date: