Re: Add progressive backoff to XactLockTableWait functions - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Add progressive backoff to XactLockTableWait functions
Date
Msg-id d71d67c9-0fdc-4825-a902-d69ea8deb714@oss.nttdata.com
Whole thread Raw
In response to Add progressive backoff to XactLockTableWait functions  (Xuneng Zhou <xunengzhou@gmail.com>)
Responses Re: Add progressive backoff to XactLockTableWait functions
Re: Add progressive backoff to XactLockTableWait functions
List pgsql-hackers

On 2025/06/24 1:32, Xuneng Zhou wrote:
> Hi,
> 
> 
> Here's patch version 4.
> 
> 
> 1. The problem
> 
> I conducted further investigation on this issue. The 1ms sleep in XactLockTableWait that falls back to polling was
notproblematic in certain scenarios prior to v16. It became an potential issue after the "Allow logical decoding on
standbys"feature was introduced [1]. After that commit, we can now create logical replication slots on hot standby
serversand perform many other logical decoding operations. [2]
 
> 
> 
> 
> 2. The analysis
> 
> Is this issue limited to calling sites related to logical decoding? I believe so. As we've discussed, the core issue
isthat primary transactions are not running locally on the standby, which breaks the assumption of
XactLockTableWait—thatthere is an xid lock to wait on. Other call stacks of this function, except those from logical
decoding,are unlikely to encounter this problem because they are unlikely to be invoked from standby servers.
 
> 
> 
> Analysis of XactLockTableWait calling sites:
> 
> 
> Problematic Case (Hot Standby):
> 
> - Logical decoding operations in snapbuild.c during snapshot building
> 
> - This is thecase that can run on hot standby and experience the issue
> 
> 
> Non-problematic Cases (Primary Only):
> 
> - Heap operations (heapam.c): DML operations not possible on read-only standby
> 
> - B-tree operations (nbtinsert.c): Index modifications impossible on standby
> 
> - Logical apply workers (execReplication.c): Cannot start during recovery (BgWorkerStart_RecoveryFinished)
> 
> - All other cases: Require write operations unavailable on standby
> 
> 
> 
> 3. The proposed solution
> 
> If the above analysis is sound, one potential fix would be to add separate branching for standby in
XactLockTableWait.However, this seems inconsistent with the function's definition—there's simply no lock entry in the
locktable for waiting. We could implement a new function for this logic, 
 

To be honest, I'm fine with v3, since it only increases the sleep time
after 5000 loop iterations, which has negligible performance impact.
But if these functions aren't intended to be used during recovery and
the loop shouldn't reach that many iterations, I'm also okay with
the v4 approach of introducing a separate function specifically for recovery.

But this amakes me wonder if we should add something like
Assert(!RecoveryInProgress()) to those two functions, to prevent them
from being used during recovery in the future.

> but it's hard to find a proper place for it to fit well: lmgr.c is for locks, while standby.c or procarray.c are not
thatideal either. Therefore, I placed the special handling for standby in SnapBuildWaitSnapshot.
 

Since the purpose and logic of the new function are similar to
XactLockTableWait(), I think it would be better to place it nearby
in lmgr.c, even though it doesn't handle a lock directly. That would
help keep the related logic together and improve readability.

Regards,

-- 
Fujii Masao
NTT DATA Japan Corporation




pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Proposal: Global Index for PostgreSQL
Next
From: Daniel Gustafsson
Date:
Subject: Re: [PATCH] initdb: Treat empty -U argument as unset username