Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From shveta malik
Subject Re: Conflict detection for update_deleted in logical replication
Date
Msg-id CAJpy0uDL4oLdhYup44a2=1OeyUSsKhg8Y30-h1uxcf=mki57uA@mail.gmail.com
Whole thread Raw
In response to Re: Conflict detection for update_deleted in logical replication  (shveta malik <shveta.malik@gmail.com>)
List pgsql-hackers
On Tue, May 20, 2025 at 8:38 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> Please find few more comments:
>
> 1)
> ProcessStandbyPSRequestMessage:
> + /*
> + * This shouldn't happen because we don't support getting primary status
> + * message from standby.
> + */
> + if (RecoveryInProgress())
> + elog(ERROR, "the primary status is unavailable during recovery");
>
>
> a) This error is not clear. Is it supposed to be user oriented error
> or internal error? As a user, it is difficult to interpret this error
> and take some action.
>
> b) What I understood is that there is no user of enabling
> 'retain_conflict_info' for a subscription which is subscribing to a
> publisher which is physical standby too. So shall we emit such an
> ERROR during 'Create Sub(retain_conflict_info=on)' itself? But that
> would need checking whether the publisher is physical standby or not
> during CREATE-SUB. Is that a possibility?
>
> 2)
>
> ----------
>   send_feedback(last_received, requestReply, requestReply);
>
> + maybe_advance_nonremovable_xid(&data, false);
> +
>   /*
>   * Force reporting to ensure long idle periods don't lead to
>   * arbitrarily delayed stats. Stats can only be reported outside
> ----------
>
> Why do we need this call to 'maybe_advance_nonremovable_xid' towards
> end of LogicalRepApplyLoop() i.e. the last call? Can it make any
> further 'data.phase' change here? IIUC, there are 2 triggers for
> 'data.phase' change through LogicalRepApplyLoop(). First one is for
> the very first time when we start this loop, it will set data.phase to
> 0  (RCI_GET_CANDIDATE_XID) and will call
> maybe_advance_nonremovable_xid(). After that, LogicalRepApplyLoop()
> function can trigger a 'data.phase' change only when it receives a
> response from the publisher. Shouldn't the first 4 calls
>  to maybe_advance_nonremovable_xid() from LogicalRepApplyLoop() suffice?
>
> 3)
> Code is almost the same for GetOldestActiveTransactionId() and
> GetOldestTransactionIdInCommit(). I think we can merge these two.
> GetOldestActiveTransactionId() can take new arg "getTxnInCommit".
> GetOldestTransactionIdInCommit() can call
> GetOldestActiveTransactionId() with that arg as true, whereas other 2
> callers can pass it as false.
>


Few more comments mostly for patch001:

4)
For this feature, since we are only interested in remote UPDATEs
happening concurrently, so shall we ask primary to provide oldest
"UPDATE" transaction-id in commit-phase rather than any operation's
transaction-id? This may avoid unnecessarily waiting and pinging at
subscriber's end in order to advance oldest_nonremovable-xid.
Thoughts?

5)
+
+/*
+ * GetOldestTransactionIdInCommit()
+ *
+ * Similar to GetOldestActiveTransactionId but returns the oldest
transaction ID
+ * that is currently in the commit phase.
+ */
+TransactionId
+GetOldestTransactionIdInCommit(void)

If there is no transaction currently in 'commit' phase, this function
will then return the next transaction-id. Please mention this in the
comments. I think the doc 'protocol-replication.html' should also be
updated for the same.

6)

+ data->flushpos_update_time = 0;

Do we really need to reset this 'flushpos_update_time' at the end of
wait_for_local_flush()? Even in the next cycle (when the phase
restarts from RCI_GET_CANDIDATE_XID), we can reuse this time to decide
whether we should call get_flush_position() again or skip it, when in
wait_for_local_flush().

7)

+/*
+ * The remote WAL position that has been applied and flushed locally. We
+ * record this information while sending feedback to the server and use this
+ * both while sending feedback and advancing oldest_nonremovable_xid.
+ */
+static XLogRecPtr last_flushpos = InvalidXLogRecPtr;

I think we record/update last_flushpos in wait_for_local_flush() as
well. Shall we update comments accordingly?

8)
Shall we rename "check_conflict_info_retaintion" to
"check_conflict_info_retention" or
"check_remote_for_retainconflictinfo"? ('retaintion' is not a correct
word)

9)
In RetainConflictInfoData, we can keep reply_time along with other
remote_* variables. The idea is to separate the variables received in
remote's response from the ones purely set and reset by the local
node.

thanks
Shveta



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: plan shape work
Next
From: Aleksander Alekseev
Date:
Subject: Re: Should we optimize the `ORDER BY random() LIMIT x` case?