Re: [PERFORM] Cpu usage 100% on slave. s_lock problem. - Mailing list pgsql-hackers
From | Merlin Moncure |
---|---|
Subject | Re: [PERFORM] Cpu usage 100% on slave. s_lock problem. |
Date | |
Msg-id | CAHyXU0zLci1x_2WaSTN=oLgTRaKO=wtxcnOPNr_LmB04Zbn9fw@mail.gmail.com Whole thread Raw |
In response to | Re: [PERFORM] Cpu usage 100% on slave. s_lock problem. (Ants Aasma <ants@cybertec.at>) |
Responses |
Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
|
List | pgsql-hackers |
On Wed, Oct 2, 2013 at 3:43 PM, Ants Aasma <ants@cybertec.at> wrote: > On Wed, Oct 2, 2013 at 10:37 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Wed, Oct 2, 2013 at 9:45 AM, Ants Aasma <ants@cybertec.at> wrote: >>> I haven't reviewed the code in as much detail to say if there is an >>> actual race here, I tend to think there's probably not, but the >>> specific pattern that I had in mind is that with the following actual >>> code: >> >> hm. I think there *is* a race. 2+ threads could race to the line: >> >> LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress; >> >> and simultaneously set the value of LocalRecoveryInProgress to false, >> and both engage InitXLOGAccess, which is destructive. The move >> operation is atomic, but I don't think there's any guarantee the reads >> to xlogctl->SharedRecoveryInProgress are ordered between threads >> without a lock. >> >> I don't think the memory barrier will fix this. Do you agree? If so, >> my earlier patch (recovery4) is another take on this problem, and >> arguable safer; the unlocked read is in a separate path that does not >> engage InitXLOGAccess() > > InitXLOGAccess only writes backend local variables, so it can't be > destructive. In fact, it calls GetRedoRecPtr(), which does a spinlock > acquisition cycle, which should be a full memory barrier. A read > barrier after accessing SharedRecoveryInProgress is good enough, and > it seems to be necessary to avoid a race condition for > XLogCtl->ThisTimeLineID. > > Admittedly the race is so unprobable that in practice it probably > doesn't matter. I just wanted to be spell out the correct way to do > unlocked accesses as it can get quite confusing. I found Herb Sutter's > "atomic<> weapons" talk very useful, thinking about the problem in > terms of acquire and release makes it much clearer to me. If we don't care about multiple calls to InitXLOGAccess() (for a backend) then we can get away with just a barrier. That's a pretty weird assumption though (even if 'improbable') and I think it should be well documented in the code, particularly since previous versions went though such trouble to do a proper check->set->init. I'm still leaning on my earlier idea (recovery4.patch) since it keeps the old semantics by exploiting the fact that the call site of interest (only) does not require a correct answer (that is, for the backend to think it's in recovery when it's not). That just defers the heap prune for a time and you don't have to mess around with barriers at all or be concerned about present or future issues caused by spurious extra calls to InitXLOGAccess(). The other code paths to RecoveryInProgress() appear not to be interesting from a spinlock perspective. merlin
pgsql-hackers by date: