From 11072d138d900227b963b54d1a3626cf256db721 Mon Sep 17 00:00:00 2001 From: Shlok Kyal Date: Fri, 24 Nov 2023 16:37:14 +0530 Subject: [PATCH v2] Deadlock when apply worker,tablesync worker and client backend run parallelly. Apply worker holds a lock on the table pg_subscription_rel and waits for notification from the tablesync workers where the relation is synced, which happens through updating the pg_subscription_rel row. And that wait happens in wait_for_relation_state_change, which simply checks the row in a loop, with a sleep by WaitLatch(). Unfortunately, the tablesync workers can't update the row because the client backend executing ALTER SUBSCRIPTION ... REFRESH PUBLICATION sneaked in, and waits for an AccessExclusiveLock. So the tablesync workers are stuck in the queue and can't proceed. The client backend is waiting for a lock held by the apply worker. The tablesync workers can't proceed because their lock request is stuck behind the AccessExclusiveLock request. And the apply worker is waiting for status update from the tablesync workers. And the deadlock is undetected, because the apply worker is not waiting on a lock, but sleeping on a latch. We have resolved the issue by releasing the lock by commiting the transaction before the apply worker goes to wait state. --- src/backend/replication/logical/tablesync.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index df3c42eb5d..b71234d998 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -542,15 +542,26 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn) LWLockRelease(LogicalRepWorkerLock); /* - * Enter busy loop and wait for synchronization worker to - * reach expected state (or die trying). + * If we have a transaction, we must commit it to release + * any locks we have. Then start a new transaction so we + * can examine catalog state. */ - if (!started_tx) + if (started_tx) + { + CommitTransactionCommand(); + pgstat_report_stat(false); + StartTransactionCommand(); + } + else { StartTransactionCommand(); started_tx = true; } + /* + * Enter busy loop and wait for synchronization worker to + * reach expected state (or die trying). + */ wait_for_relation_state_change(rstate->relid, SUBREL_STATE_SYNCDONE); } -- 2.34.1