On Wed, Aug 13, 2025 at 9:42 AM Greg Burd <greg@burd.me> wrote:
> Amazing, thank you. I'll try to replicate your tests tomorrow to see if
> my optimized division and modulo functions do in fact help or not. I
> realize that both you and Anders are (rightly) concerned that the
> performance impact of IDIV on some CPUs can be excessive.
At the risk of posting untested crackpot theories on the internet, I
wonder if there is a way to use a simple boundary condition and
subtraction for this. If you correct overshoot compared to an
advancing-in-strides base value, then I wonder how often you'd finish
up having to actually do that under concurrency. Obviously in
general, implementing modulo with subtraction is a terrible idea, but
can you make it so that the actual cost works out as mostly 0, rarely
1 and exceedingly rarely more than 1 subtraction loops? If that's
true, do the branches somehow kill you?
Assume for now that we're OK with keeping % and / for the infrequent
calls to StrategySyncStart(), or we can redefinine the bgwriter's
logic so that it doesn't even need those (perhaps what it really wants
to know is its total distance behind the allocator, so perhaps we can
define that problem away? haven't thought about that yet...). What
I'm wondering out loud is whether the hot ClockSweepTick() code might
be able to use something nearly as dumb as this...
/* untested pseudocode */
ticks_base = pg_atomic_read_u64(&x->ticks_base);
ticks = pg_atomic_fetch_add_u64(&x->ticks, 1);
hand = ticks - ticks_base;
/*
* Compensate for overshoot. Expected number of loops: none most of the
* time, one when we overshoot, and maybe more if the system gets
* around the whole clock before we see the base value advance.
*/
while (hand >= NBuffers)
{
/* Base value advanced by backend that overshoots by one tick. */
if (hand == NBuffers)
pg_atomic_fetch_add_u64(&StrategyControl->ticks_base, NBuffers);
hand -= NBuffers;
}