Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Conflict detection for update_deleted in logical replication |
Date | |
Msg-id | CAJpy0uDL4oLdhYup44a2=1OeyUSsKhg8Y30-h1uxcf=mki57uA@mail.gmail.com Whole thread Raw |
In response to | Re: Conflict detection for update_deleted in logical replication (shveta malik <shveta.malik@gmail.com>) |
List | pgsql-hackers |
On Tue, May 20, 2025 at 8:38 AM shveta malik <shveta.malik@gmail.com> wrote: > > Please find few more comments: > > 1) > ProcessStandbyPSRequestMessage: > + /* > + * This shouldn't happen because we don't support getting primary status > + * message from standby. > + */ > + if (RecoveryInProgress()) > + elog(ERROR, "the primary status is unavailable during recovery"); > > > a) This error is not clear. Is it supposed to be user oriented error > or internal error? As a user, it is difficult to interpret this error > and take some action. > > b) What I understood is that there is no user of enabling > 'retain_conflict_info' for a subscription which is subscribing to a > publisher which is physical standby too. So shall we emit such an > ERROR during 'Create Sub(retain_conflict_info=on)' itself? But that > would need checking whether the publisher is physical standby or not > during CREATE-SUB. Is that a possibility? > > 2) > > ---------- > send_feedback(last_received, requestReply, requestReply); > > + maybe_advance_nonremovable_xid(&data, false); > + > /* > * Force reporting to ensure long idle periods don't lead to > * arbitrarily delayed stats. Stats can only be reported outside > ---------- > > Why do we need this call to 'maybe_advance_nonremovable_xid' towards > end of LogicalRepApplyLoop() i.e. the last call? Can it make any > further 'data.phase' change here? IIUC, there are 2 triggers for > 'data.phase' change through LogicalRepApplyLoop(). First one is for > the very first time when we start this loop, it will set data.phase to > 0 (RCI_GET_CANDIDATE_XID) and will call > maybe_advance_nonremovable_xid(). After that, LogicalRepApplyLoop() > function can trigger a 'data.phase' change only when it receives a > response from the publisher. Shouldn't the first 4 calls > to maybe_advance_nonremovable_xid() from LogicalRepApplyLoop() suffice? > > 3) > Code is almost the same for GetOldestActiveTransactionId() and > GetOldestTransactionIdInCommit(). I think we can merge these two. > GetOldestActiveTransactionId() can take new arg "getTxnInCommit". > GetOldestTransactionIdInCommit() can call > GetOldestActiveTransactionId() with that arg as true, whereas other 2 > callers can pass it as false. > Few more comments mostly for patch001: 4) For this feature, since we are only interested in remote UPDATEs happening concurrently, so shall we ask primary to provide oldest "UPDATE" transaction-id in commit-phase rather than any operation's transaction-id? This may avoid unnecessarily waiting and pinging at subscriber's end in order to advance oldest_nonremovable-xid. Thoughts? 5) + +/* + * GetOldestTransactionIdInCommit() + * + * Similar to GetOldestActiveTransactionId but returns the oldest transaction ID + * that is currently in the commit phase. + */ +TransactionId +GetOldestTransactionIdInCommit(void) If there is no transaction currently in 'commit' phase, this function will then return the next transaction-id. Please mention this in the comments. I think the doc 'protocol-replication.html' should also be updated for the same. 6) + data->flushpos_update_time = 0; Do we really need to reset this 'flushpos_update_time' at the end of wait_for_local_flush()? Even in the next cycle (when the phase restarts from RCI_GET_CANDIDATE_XID), we can reuse this time to decide whether we should call get_flush_position() again or skip it, when in wait_for_local_flush(). 7) +/* + * The remote WAL position that has been applied and flushed locally. We + * record this information while sending feedback to the server and use this + * both while sending feedback and advancing oldest_nonremovable_xid. + */ +static XLogRecPtr last_flushpos = InvalidXLogRecPtr; I think we record/update last_flushpos in wait_for_local_flush() as well. Shall we update comments accordingly? 8) Shall we rename "check_conflict_info_retaintion" to "check_conflict_info_retention" or "check_remote_for_retainconflictinfo"? ('retaintion' is not a correct word) 9) In RetainConflictInfoData, we can keep reply_time along with other remote_* variables. The idea is to separate the variables received in remote's response from the ones purely set and reset by the local node. thanks Shveta
pgsql-hackers by date: