Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers) - Mailing list pgsql-hackers
From | Hsu, John |
---|---|
Subject | Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers) |
Date | |
Msg-id | b7561922-c241-2e3c-0da9-cd01bf4e6ebf@amazon.com Whole thread Raw |
In response to | Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers) (Nathan Bossart <nathandbossart@gmail.com>) |
Responses |
Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
|
List | pgsql-hackers |
Hello, On 2/25/22 11:38 AM, Nathan Bossart wrote: > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you canconfirm the sender and know the content is safe. > > > > On Fri, Feb 25, 2022 at 08:31:37PM +0530, Bharath Rupireddy wrote: >> Thanks Satya and others for the inputs. Here's the v1 patch that >> basically allows async wal senders to wait until the sync standbys >> report their flush lsn back to the primary. Please let me know your >> thoughts. > I haven't had a chance to look too closely yet, but IIUC this adds a new > function that waits for synchronous replication. This new function > essentially spins until the synchronous LSN has advanced. > > I don't think it's a good idea to block sending any WAL like this. AFAICT > it is possible that there will be a lot of synchronously replicated WAL > that we can send, and it might just be the last several bytes that cannot > yet be replicated to the asynchronous standbys. І believe this patch will > cause the server to avoid sending _any_ WAL until the synchronous LSN > advances. > > Perhaps we should instead just choose the SendRqstPtr based on the current > synchronous LSN. Presumably there are other things we'd need to consider, > but in general, I think we ought to send as much WAL as possible for a > given call to XLogSendPhysical(). I think you're right that we'll avoid sending any WAL until sync_lsn advances. We could setup a contrived situation where the async-walsender never advances because it terminates before the flush_lsn of the synchronous_node catches up. And when the async-walsender restarts, it'll start with the latest flushed on the primary and we could go into a perpetual loop. I took a look at the patch and tested basic streaming with async replicas ahead of the synchronous standby and with logical clients as well and it works as expected. > > ereport(LOG, > (errmsg("async standby WAL sender with request LSN %X/%X is waiting as sync standbys are ahead with flush LSN %X/%X", > LSN_FORMAT_ARGS(flushLSN), LSN_FORMAT_ARGS(sendRqstPtr)), > errhidestmt(true))); I think this log formatting is incorrect. s/sync standbys are ahead/sync standbys are behind/ and I think you need to swap flushLsn and sendRqstPtr When a walsender is waiting for the lsn on the synchronous replica to advance and a database stop is issued to the writer, the pg_ctl stop isn't able to proceed and the database seems to never shutdown. > Assert(priority >= 0); What's the point of the assert here? Also the comments/code refer to AsyncStandbys, however it's also used for logical clients, which may or may not be standbys. Don't feel too strongly about the naming here but something to note. > if (!ShouldWaitForSyncRepl()) > return; > ... > for (;;) > { > // rest of work > } If we had a walsender already waiting for an ack, and the conditions of ShouldWaitForSyncRepl() change, such as disabling async_standbys_wait_for_sync_replication or synchronous replication it'll still wait since we never re-check the condition. postgres=# select wait_event from pg_stat_activity where wait_event like 'AsyncWal%'; wait_event -------------------------------------- AsyncWalSenderWaitForSyncReplication AsyncWalSenderWaitForSyncReplication AsyncWalSenderWaitForSyncReplication (3 rows) postgres=# show synchronous_standby_names; synchronous_standby_names --------------------------- (1 row) postgres=# show async_standbys_wait_for_sync_replication; async_standbys_wait_for_sync_replication ------------------------------------------ off (1 row) > LWLockAcquire(SyncRepLock, LW_SHARED); > flushLSN = walsndctl->lsn[SYNC_REP_WAIT_FLUSH]; > LWLockRelease(SyncRepLock); Should we configure this similar to the user's setting of synchronous_commit instead of just flush? (SYNC_REP_WAIT_WRITE, SYNC_REP_WAIT_APPLY) Thanks, John H
pgsql-hackers by date: