Re: logical replication empty transactions - Mailing list pgsql-hackers
From | Ajin Cherian |
---|---|
Subject | Re: logical replication empty transactions |
Date | |
Msg-id | CAFPTHDa1ydj7ofc3xgvu6gvZWbvS=_GNpFqb20HygBBEJ6Zfxw@mail.gmail.com Whole thread Raw |
In response to | Re: logical replication empty transactions (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: logical replication empty transactions
|
List | pgsql-hackers |
On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jul 23, 2021 at 3:39 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > > Let's first split the patch for prepared and non-prepared cases as > that will help to focus on each of them separately. BTW, why haven't > you considered implementing point 1b as explained by Andres in his > email [1]? I think we can send a keepalive message in case of > synchronous replication when we skip an empty transaction, otherwise, > it might delay in responding to transactions synchronous_commit mode. > I think in the tests done in the thread, it might not have been shown > because we are already sending keepalives too frequently. But what if > someone disables wal_sender_timeout or kept it to a very large value? > See WalSndKeepaliveIfNecessary. The other thing you might want to look > at is if the reason for frequent keepalives is the same as described > in the email [2]. > I have tried to address the comment here by modifying the ctx->update_progress callback function (WalSndUpdateProgress) provided for plugins. I have added an option by which the callback can specify if it wants to send keep_alives. And when the callback is called with that option set, walsender updates a flag force_keep_alive_syncrep. The Walsender in the WalSndWaitForWal for loop, checks this flag and if synchronous replication is enabled, then sends a keep alive. Currently this logic is added as an else to the current logic that is already there in WalSndWaitForWal, which is probably considered unnecessary and a source of the keep alive flood that you talked about. So, I can change that according to how that fix shapes up there. I have also added an extern function in syncrep.c that makes it possible for walsender to query if synchronous replication is turned on. The reason I had to turn on a flag and rely on the WalSndWaitForWal to send the keep alive in its next iteration is because I tried doing this directly when a commit is skipped but it didn't work. The reason for this is that when the commit is being decoded the sentptr at the moment is at the commit LSN and the keep alive will be sent for the commit LSN but the syncrep wait is waiting for end_lsn of the transaction which is the next LSN. So, sending a keep alive at the moment the commit is decoded doesn't seem to solve the problem of the waiting synchronous reply. > Few other miscellaneous comments: > 1. > static void > pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > - XLogRecPtr commit_lsn) > + XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn, > + TimestampTz prepare_time) > { > + PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private; > + > OutputPluginUpdateProgress(ctx); > > + /* > + * If the BEGIN PREPARE was not yet sent, then it means there were no > + * relevant changes encountered, so we can skip the COMMIT PREPARED > + * message too. > + */ > + if (txndata) > + { > + bool skip = !txndata->sent_begin_txn; > + pfree(txndata); > + txn->output_plugin_private = NULL; > > How is this supposed to work after the restart when prepared is sent > before the restart and we are just sending commit_prepared after > restart? Won't this lead to sending commit_prepared even when the > corresponding prepare is not sent? Can we think of a better way to > deal with this? > I have tried to resolve this by adding logic in worker,c to silently ignore spurious commit_prepareds. But this change required checking if the prepare exists on the subscriber before attempting the commit_prepared but the current API that checks this requires prepare time and transaction end_lsn. But for this I had to change the protocol of commit_prepared, and I understand that this would break backward compatibility between subscriber and publisher (you have raised this issue as well). I am not sure how else to handle this, let me know if you have any other ideas. One option could be to have another API to check if the prepare exists on the subscriber with the prepared 'gid' alone, without checking prepare_time or end_lsn. Let me know if this idea works. I have left out the patch 0002 for prepared transactions until we arrive at a decision on how to address the above issue. Peter, I have also addressed the comments you've raised on patch 0001, please have a look and confirm. Regards, Ajin Cherian Fujitsu Australia.
Attachment
pgsql-hackers by date: