Re: Better LWLocks with compare-and-swap (9.4) - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Better LWLocks with compare-and-swap (9.4) |
Date | |
Msg-id | 519A938A.1070903@vmware.com Whole thread Raw |
In response to | Re: Better LWLocks with compare-and-swap (9.4) (Daniel Farina <daniel@heroku.com>) |
Responses |
Spinlock implementation on x86_64 (was Re: Better LWLocks
with compare-and-swap (9.4))
|
List | pgsql-hackers |
On 16.05.2013 01:08, Daniel Farina wrote: > On Mon, May 13, 2013 at 5:50 AM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> pgbench -S is such a workload. With 9.3beta1, I'm seeing this profile, when >> I run "pgbench -S -c64 -j64 -T60 -M prepared" on a 32-core Linux machine: >> >> - 64.09% postgres postgres [.] tas >> - tas >> - 99.83% s_lock >> - 53.22% LWLockAcquire >> + 99.87% GetSnapshotData >> - 46.78% LWLockRelease >> GetSnapshotData >> + GetTransactionSnapshot >> + 2.97% postgres postgres [.] tas >> + 1.53% postgres libc-2.13.so [.] 0x119873 >> + 1.44% postgres postgres [.] GetSnapshotData >> + 1.29% postgres [kernel.kallsyms] [k] arch_local_irq_enable >> + 1.18% postgres postgres [.] AllocSetAlloc >> ... >> >> So, on this test, a lot of time is wasted spinning on the mutex of >> ProcArrayLock. If you plot a graph of TPS vs. # of clients, there is a >> surprisingly steep drop in performance once you go beyond 29 clients >> (attached, pgbench-lwlock-cas-local-clients-sets.png, red line). My theory >> is that after that point all the cores are busy, and processes start to be >> sometimes context switched while holding the spinlock, which kills >> performance. > > I have, I also used linux perf to come to this conclusion, and my > determination was similar: a system was undergoing increasingly heavy > load, in this case with processes>> number of processors. It was > also a phase-change type of event: at one moment everything would be > going great, but once a critical threshold was hit, s_lock would > consume enormous amount of CPU time. I figured preemption while in > the spinlock was to blame at the time, given the extreme nature Stop the press! I'm getting the same speedup on that 32-core box I got with the compare-and-swap patch, from this one-liner: --- a/src/include/storage/s_lock.h +++ b/src/include/storage/s_lock.h @@ -200,6 +200,8 @@ typedef unsigned char slock_t; #define TAS(lock) tas(lock) +#define TAS_SPIN(lock) (*(lock) ? 1 : TAS(lock)) + static __inline__ int tas(volatile slock_t *lock) { So, on this system, doing a non-locked test before the locked xchg instruction while spinning, is a very good idea. That contradicts the testing that was done earlier when the x86-64 implementation was added, as we have this comment in the tas() implementation: > /* > * On Opteron, using a non-locking test before the locking instruction > * is a huge loss. On EM64T, it appears to be a wash or small loss, > * so we needn't bother to try to distinguish the sub-architectures. > */ On my test system, the non-locking test is a big win. I tested this because I was reading this article from Intel: http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-for-multi-core-intel-em64t-and-ia32-architectures/. It says very explicitly that the non-locking test is a good idea: > Spinning on volatile read vs. spin on lock attempt > > One common mistake made by developers developing their own spin-wait loops is attempting to spin on an atomic instructioninstead of spinning on a volatile read. Spinning on a dirty read instead of attempting to acquire a lock consumesless time and resources. This allows an application to only attempt to acquire a lock only when it is free. Now, I'm not sure what to do about this. If we put the non-locking test in there, according to the previous testing that would be a huge loss on Opterons. Perhaps we should just sleep earlier, ie. lower MAX_SPINS_PER_DELAY. That way, even if each TAS_SPIN test is very expensive, we don't spend too much time spinning if it's really busy, or held by a process that is sleeping. - Heikki
pgsql-hackers by date: