Thread: Add progressive backoff to XactLockTableWait functions
Hi hackers, This patch implements progressive backoff in XactLockTableWait() and ConditionalXactLockTableWait(). As Kevin reported in this thread [1], XactLockTableWait() can enter a tight polling loop during logical replication slot creation on standby servers, sleeping for fixed 1ms intervals that can continue for a long time. This creates significant CPU overhead. The patch implements a time-based threshold approach based on Fujii’s idea [1]: keep sleeping for 1ms until the total sleep time reaches 10 seconds, then start exponential backoff (doubling the sleep duration each cycle) up to a maximum of 10 seconds per sleep. This balances responsiveness for normal operations (which typically complete within seconds) against CPU efficiency for the long waits in some logical replication scenarios. [1] https://www.postgresql.org/message-id/flat/CAM45KeELdjhS-rGuvN%3DZLJ_asvZACucZ9LZWVzH7bGcD12DDwg%40mail.gmail.com Best regards, Xuneng
Attachment
On 2025/06/08 23:33, Xuneng Zhou wrote: > Hi hackers, > > This patch implements progressive backoff in XactLockTableWait() and > ConditionalXactLockTableWait(). > > As Kevin reported in this thread [1], XactLockTableWait() can enter a > tight polling loop during logical replication slot creation on standby > servers, sleeping for fixed 1ms intervals that can continue for a long > time. This creates significant CPU overhead. > > The patch implements a time-based threshold approach based on Fujii’s > idea [1]: keep sleeping for 1ms until the total sleep time reaches 10 > seconds, then start exponential backoff (doubling the sleep duration > each cycle) up to a maximum of 10 seconds per sleep. This balances > responsiveness for normal operations (which typically complete within > seconds) against CPU efficiency for the long waits in some logical > replication scenarios. Thanks for the patch! When I first suggested this idea, I used 10s as an example for the maximum sleep time. But thinking more about it now, 10s might be too long. Even if the target transaction has already finished, XactLockTableWait() could still wait up to 10 seconds, which seems excessive. What about using 1s instead? That value is already used as a max sleep time in other places, like WaitExceedsMaxStandbyDelay(). If we agree on 1s as the max, then using exponential backoff from 1ms to 1s after the threshold might not be necessary. It might be simpler and sufficient to just sleep for 1s once we hit the threshold. Based on that, I think a change like the following could work well. Thought? ---------------------------------------- XactLockTableWaitInfo info; ErrorContextCallback callback; bool first = true; + int left_till_hibernate = 5000; <snip> if (!first) { CHECK_FOR_INTERRUPTS(); - pg_usleep(1000L); + + if (left_till_hibernate > 0) + { + pg_usleep(1000L); + left_till_hibernate--; + } + else + pg_usleep(1000000L); ---------------------------------------- Regards, -- Fujii Masao NTT DATA Japan Corporation
Hi,
Thanks for the feedback!
Thanks for the feedback!
On Thu, Jun 12, 2025 at 10:02 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
When I first suggested this idea, I used 10s as an example for
the maximum sleep time. But thinking more about it now, 10s might
be too long. Even if the target transaction has already finished,
XactLockTableWait() could still wait up to 10 seconds,
which seems excessive.
+1, this could be a problem
What about using 1s instead? That value is already used as a max
sleep time in other places, like WaitExceedsMaxStandbyDelay().
1s should be generally good
If we agree on 1s as the max, then using exponential backoff from
1ms to 1s after the threshold might not be necessary. It might
be simpler and sufficient to just sleep for 1s once we hit
the threshold.
That makes sense to me.
Based on that, I think a change like the following could work well.
Thought?
I'll update the patch accordingly.
Best regards,
Xuneng
Hi,
Although it’s clear that replacing tight 1 ms polling loops will reduce CPU usage, I'm curious about the hard numbers. To that end, I ran a 60 s logical-replication slot–creation workload on a standby using three different XactLockTableWait() variants—on an 8-core, 16 GB AMD system—and collected both profiling traces and hardware-counter metrics.
1. Hardware‐counter results
- CPU cycles drop by 58% moving from 1 ms to exp. backoff, and another 25% to the 1 s threshold variant.
- Cache‐misses and context‐switches see similarly large reductions.
- IPC remains around 0.45, dipping slightly under longer sleeps.
2. Flame‐graph
See attached files
Best regards,
Xuneng
Attachment
Hi, On 2025-06-08 22:33:39 +0800, Xuneng Zhou wrote: > This patch implements progressive backoff in XactLockTableWait() and > ConditionalXactLockTableWait(). > > As Kevin reported in this thread [1], XactLockTableWait() can enter a > tight polling loop during logical replication slot creation on standby > servers, sleeping for fixed 1ms intervals that can continue for a long > time. This creates significant CPU overhead. > > The patch implements a time-based threshold approach based on Fujii’s > idea [1]: keep sleeping for 1ms until the total sleep time reaches 10 > seconds, then start exponential backoff (doubling the sleep duration > each cycle) up to a maximum of 10 seconds per sleep. This balances > responsiveness for normal operations (which typically complete within > seconds) against CPU efficiency for the long waits in some logical > replication scenarios. ISTM that this is going to wrong way - the real problem is that we seem to have extended periods where XactLockTableWait() doesn't actually work, not that the sleep time is too short. The sleep in XactLockTableWait() was intended to address a very short race, not something that's essentially unbound. Greetings, Andres Freund