Thread: Random subscription 021_twophase test failure on kestrel
Hi, The 021_twophase test has failed on Kestrel at [1] with the following error: # Failed test 'should be no prepared transactions on subscriber' # at /home/bf/bf-build/kestrel/HEAD/pgsql/src/test/subscription/t/021_twophase.pl line 438. # got: '1' # expected: '0' # Looks like you failed 1 test of 30. This failure is caused by a prepared transaction that was not properly committed due to replication lag on one of the subscriptions. The test involves two subscriptions: tap_sub and tap_sub_copy. After committing the prepared transaction 'mygid', the test only waits for tap_sub_copy to catch up: node_publisher->wait_for_catchup($appname_copy); However, tap_sub is dropped before ensuring it has replayed the commit of 'mygid' prepared transaction, leading to a leftover prepared transaction on the subscriber: $node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub"); When the test later checks for the number of prepared transactions, it fails because tap_sub had not finished applying the commit: # at line 438 # got: '1' # expected: '0' This issue can be consistently reproduced by injecting a delay (e.g., 3 seconds) in tap_sub's walsender while decoding the commit of 'mygid'. A patch to demonstrate this behavior is provided at 021_two_phase_test_failure_reproduce.patch. The test can be fixed by explicitly waiting for both subscriptions to catch up before dropping either. A patch implementing this fix is attached. Thanks Amit for the offline discussion and sharing your thoughts on the same. [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2025-05-22%2021%3A19%3A22 Regards, Vignesh
Attachment
On Sat, May 24, 2025 at 6:07 AM Michael Paquier <michael@paquier.xyz> wrote: > > > Yes, agreed that your suggested fix looks sensible with an extra check > for pg_prepared_xacts on the subscriber side that can be useful for > debugging. > +1. -- With Regards, Amit Kapila.
On Mon, 26 May 2025 at 13:59, Michael Paquier <michael@paquier.xyz> wrote: > > On Sat, May 24, 2025 at 11:27:05AM +0530, Amit Kapila wrote: > > On Sat, May 24, 2025 at 6:07 AM Michael Paquier <michael@paquier.xyz> wrote: > >> Yes, agreed that your suggested fix looks sensible with an extra check > >> for pg_prepared_xacts on the subscriber side that can be useful for > >> debugging. > > > > +1. > > Applied down to v15. Thanks for committing this. The buildfarm runs have been successful so far; I’ll continue monitoring them over the next few days. Regards, Vignesh