Re: Microsecond sleeps with select() - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Microsecond sleeps with select() |
Date | |
Msg-id | 4120.982430791@sss.pgh.pa.us Whole thread Raw |
In response to | Microsecond sleeps with select() (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: Microsecond sleeps with select()
Re: Microsecond sleeps with select() Re: Microsecond sleeps with select() |
List | pgsql-hackers |
Bruce Momjian <pgman@candle.pha.pa.us> writes: > A comment on microsecond delays using select(). Most Unix kernels run > at 100hz, meaning that they have a programmable timer that interrupts > the CPU every 10 milliseconds. Right --- this probably also explains my observation that some kernels seem to add an extra 10msec to the requested sleep time. Actually they're interpreting a one-clock-tick select() delay as "wait till the next clock tick, plus one tick". The actual delay will be between one and two ticks depending on just when you went to sleep. I have been thinking some more about the s_lock() delay loop in connection with this. We currently have /** Each time we busy spin we select the next element of this array as the* number of microseconds to wait. This accomplishespseudo random back-off.* Values are not critical but 10 milliseconds is a common platform* granularity.** Totaltime to cycle through all 20 entries might be about .07 sec,* so the given value of S_MAX_BUSY results in timeout after~70 sec.*/ #define S_NSPINCYCLE 20 #define S_MAX_BUSY 1000 * S_NSPINCYCLE int s_spincycle[S_NSPINCYCLE] = { 0, 0, 0, 0, 10000, 0, 0, 0, 10000, 0,0, 10000, 0, 0, 10000, 0, 10000, 0, 10000, 10000 }; Having read the select(2) man page more closely, I now realize that it is *defined* not to yield the processor when the requested delay is zero: it just checks the file ready status and returns immediately. Therefore, on a single-CPU machine, the zero entries in this array are a gold-plated, guaranteed waste of time: we'll cycle through the kernel call, go back and retest the spinlock, and find it still locked. On a multi-CPU machine, the time wasted in the kernel call might possibly be enough to allow the lock holder (if running on another CPU) to finish its work and release the lock. But it's more likely that we're just wasting CPU. On either single or multi CPUs, there is no "pseudo random backoff" behavior here whatsoever. Spinning through the zero entries will take some small number of microseconds, and then we hit the one-clock-tick select() delay. In reality, every backend waiting for a spinlock will be awoken on every clock tick. If your kernel is one of the ones that interprets a one-tick delay request as "rest of the current tick plus a tick", then all the delays are actually two ticks. In this case, on average half the population of waiting backends will be awoken on each alternate tick. A little better but not much. In short: s_spincycle in its current form does not do anything anywhere near what the author thought it would. It's wasted complexity. I am thinking about simplifying s_lock_sleep down to simple wait-one-tick-on-every-call logic. An alternative is to keep s_spincycle, but populate it with, say, 10000, 20000 and larger entries, which would offer some hope of actual random-backoff behavior. Either change would clearly be a win on single-CPU machines, and I doubt it would hurt on multi-CPU machines. Comments? regards, tom lane
pgsql-hackers by date: