Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Conflict detection for update_deleted in logical replication |
Date | |
Msg-id | CAA4eK1LLaXzsKOaPwGTiikOYySYK+Ty_x3EXg-g=7M_CLn4WiQ@mail.gmail.com Whole thread Raw |
In response to | Re: Conflict detection for update_deleted in logical replication (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: Conflict detection for update_deleted in logical replication
|
List | pgsql-hackers |
On Fri, Apr 25, 2025 at 10:08 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Thu, Apr 24, 2025 at 6:11 PM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > Few comments for patch004: > > > Config.sgml: > > > 1) > > > + <para> > > > + Maximum duration (in milliseconds) for which conflict > > > + information can be retained for conflict detection by the apply worker. > > > + The default value is <literal>0</literal>, indicating that conflict > > > + information is retained until it is no longer needed for detection > > > + purposes. > > > + </para> > > > > > > IIUC, the above is not entirely accurate. Suppose the subscriber manages to > > > catch up and sets oldest_nonremovable_xid to 100, which is then updated in > > > slot. After this, the apply worker takes a nap and begins a new xid update cycle. > > > Now, let’s say the next candidate_xid is 200, but this time the subscriber fails > > > to keep up and exceeds max_conflict_retention_duration. As a result, it sets > > > oldest_nonremovable_xid to InvalidTransactionId, and the launcher skips > > > updating the slot’s xmin. > > > > If the time exceeds the max_conflict_retention_duration, the launcher would > > Invalidate the slot, instead of skipping updating it. So the conflict info(e.g., > > dead tuples) would not be retained anymore. > > > > launcher will not invalidate the slot until all subscriptions have > stopped conflict_info retention. So info of dead tuples for a > particular oldest_xmin of a particular apply worker could be retained > for much longer than this configured duration. If other apply workers > are actively working (catching up with primary), then they should keep > on advancing xmin of shared slot but if xmin of shared slot remains > same for say 15min+15min+15min for 3 apply-workers (assuming they are > marking themselves with stop_conflict_retention one after other and > xmin of slot has not been advanced), then the first apply worker > having marked itself with stop_conflict_retention still has access to > the oldest_xmin's data for 45 mins instead of 15 mins. (where > max_conflict_retention_duration=15 mins). Please let me know if my > understanding is wrong. > IIUC, the current code will stop updating the slot even if one of the apply workers has set stop_conflict_info_retention. The other apply workers will keep on maintaining their oldest_nonremovable_xid without advancing the slot. If this is correct, then what behavior instead we expect here? Do we want the slot to keep advancing till any worker is actively maintaining oldest_nonremovable_xid? To some extent, this matches with the cases where the user has set retain_conflict_info for some subscriptions but not for others. If so, how will users eventually know for which tables they can expect to reliably detect update_delete? One possibility is that users can check which apply workers have stopped maintaining oldest_nonremovable_xid via pg_stat_subscription view and then see the tables corresponding to those subscriptions. Also, what will we do as part of the resolutions in the applyworkers where stop_conflict_info_retention is set? Shall we simply LOG that we can't resolve this conflict and continue till the user takes some action, or simply error out in such cases? -- With Regards, Amit Kapila.
pgsql-hackers by date: