Re: Reduce ProcArrayLock contention - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Reduce ProcArrayLock contention |
Date | |
Msg-id | 20150821183112.GG8552@awork2.anarazel.de Whole thread Raw |
In response to | Re: Reduce ProcArrayLock contention (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Reduce ProcArrayLock contention
|
List | pgsql-hackers |
On 2015-08-21 14:08:36 -0400, Robert Haas wrote: > On Wed, Aug 19, 2015 at 11:39 AM, Andres Freund <andres@anarazel.de> wrote: > > There's a bunch of issues with those two blocks afaics: > > > > 1) The first block (in one backend) might read INVALID_PGPROCNO before > > ever locking the semaphore if a second backend quickly enough writes > > INVALID_PGPROCNO. That way the semaphore will be permanently out of > > "balance". > > Yes, this is a clear bug. I think the fix is to do one unconditional > PGSemaphoreLock(&proc->sem) just prior to the loop. I don't think that fixes it it completely, see the following. > > 2) There's no memory barriers around dealing with nextClearXidElem in > > the first block. Afaics there needs to be a read barrier before > > returning, otherwise it's e.g. not guaranteed that the woken up > > backend sees its own xid set to InvalidTransactionId. > > I can't believe it actually works like that. Surely a semaphore > operation is a full barrier. Otherwise this could fail: > P1: a = 0; > P1: PGSemaphoreLock(&P1); > P2: a = 1; > P2: PGSemaphoreUnlock(&P1); > P1: Assert(a == 1); > > Do you really think that fails anywhere? No, if it's paired like that, I don't think it's allowed to fail. But, as the code stands, there's absolutely no guarantee you're not seeing something like: P1: a = 0; P1: b = 0; P1: PGSemaphoreLock(&P1); P2: a = 1; P2: PGSemaphoreUnlock(&P1); -- unrelated, as e.g. earlier by ProcSendSignal P1: Assert(a == b == 1); P2: b = 1; P2: PGSemaphoreUnlock(&P1); if the pairing is like this there's no guarantees anymore, right? Even if a and be were set before P1's assert, the thing would be allowed to fail, because the store to a or b might each be visible since there's no enforced ordering. > > 3) If a follower returns before the leader has actually finished woken > > that respective backend up we can get into trouble: ... > So I don't see the problem. Don't have time (nor spare brain capacity) to answer in detail right now. > > I don't think it's a good idea to use the variable name in PROC_HDR and > > PGPROC, it's confusing. > > It made sense to me, but I don't care that much if you want to change it. Please. > > How hard did you try checking whether this causes regressions? This > > increases the number of atomics in the commit path a fair bit. I doubt > > it's really bad, but it seems like a good idea to benchmark something > > like a single full-throttle writer and a large number of readers. > > Hmm, that's an interesting point. My analysis was that it really only > increased the atomics in the cases where we otherwise would have gone > to sleep, and I figured that the extra atomics were a small price to > pay for not sleeping. You're probably right that it won't have a big, if any, impact. Seems easy enough to test though, and it's really the only sane adversarial scenario I could come up with.. > It's worth point out, though, that your lwlock.c atomics changes have > the same effect. To some degree, yes. I tried to benchmark adversarial scenarios rather extensively because of that... I couldn't make it regress, presumably because because putting on the waitlist only "hard" contended with other exclusive lockers, not with the shared lockers which could progress. > Before, if there were any shared lockers on an LWLock and an > exclusive-locker came along, the exclusive locker would take and > release the spinlock and go to sleep. That often spun (span?) on a spinlock which continously was acquired, so it was hard to ever get that far... > The changes we made to the clock sweep are more of the same. Yea. > Instead of taking an lwlock, sweeping the clock hand across many > buffers, and releasing the lwlock, we now perform an atomic operation > for every buffer. That's obviously "worse" in terms of the total > number of atomic operations, and if you construct a case where every > backend sweeps 10 or 20 buffers yet none of them would have contended > with each other, it might lose. But it's actually much better and > more scalable in the real world. I think we probably should optimize that bit of code at some point - right now the bottleneck appear to be elsewhere though, namely all the buffer header locks which are rather likely to be in no cache at all. > I think this is a pattern we'll see with atomics over and over again: > more atomic operations overall in order to reduce the length of time > for which particular resources are unavailable to other processes. Yea, agreed. Greetings, Andres Freund
pgsql-hackers by date: