Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Conflict detection for update_deleted in logical replication
Date
Msg-id CAA4eK1LLaXzsKOaPwGTiikOYySYK+Ty_x3EXg-g=7M_CLn4WiQ@mail.gmail.com
Whole thread Raw
In response to Re: Conflict detection for update_deleted in logical replication  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Conflict detection for update_deleted in logical replication
List pgsql-hackers
On Fri, Apr 25, 2025 at 10:08 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Apr 24, 2025 at 6:11 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
>
> > > Few comments for patch004:
> > > Config.sgml:
> > > 1)
> > > +       <para>
> > > +        Maximum duration (in milliseconds) for which conflict
> > > +        information can be retained for conflict detection by the apply worker.
> > > +        The default value is <literal>0</literal>, indicating that conflict
> > > +        information is retained until it is no longer needed for detection
> > > +        purposes.
> > > +       </para>
> > >
> > > IIUC, the above is not entirely accurate. Suppose the subscriber manages to
> > > catch up and sets oldest_nonremovable_xid to 100, which is then updated in
> > > slot. After this, the apply worker takes a nap and begins a new xid update cycle.
> > > Now, let’s say the next candidate_xid is 200, but this time the subscriber fails
> > > to keep up and exceeds max_conflict_retention_duration. As a result, it sets
> > > oldest_nonremovable_xid to InvalidTransactionId, and the launcher skips
> > > updating the slot’s xmin.
> >
> > If the time exceeds the max_conflict_retention_duration, the launcher would
> > Invalidate the slot, instead of skipping updating it. So the conflict info(e.g.,
> > dead tuples) would not be retained anymore.
> >
>
> launcher will not invalidate the slot until all subscriptions have
> stopped conflict_info retention. So info of dead tuples for a
> particular oldest_xmin of a particular apply worker could be retained
> for much longer than this configured duration. If other apply workers
> are actively working (catching up with primary), then they should keep
> on advancing xmin of shared slot but if xmin of shared slot remains
> same for say 15min+15min+15min for 3 apply-workers (assuming they are
> marking themselves with stop_conflict_retention one after other and
> xmin of slot has not been advanced), then the first apply worker
> having marked itself with stop_conflict_retention still has access to
> the oldest_xmin's data for 45 mins instead of 15 mins. (where
> max_conflict_retention_duration=15 mins). Please let me know if my
> understanding is wrong.
>

IIUC, the current code will stop updating the slot even if one of the
apply workers has set stop_conflict_info_retention. The other apply
workers will keep on maintaining their oldest_nonremovable_xid without
advancing the slot. If this is correct, then what behavior instead we
expect here? Do we want the slot to keep advancing till any worker is
actively maintaining oldest_nonremovable_xid? To some extent, this
matches with the cases where the user has set retain_conflict_info for
some subscriptions but not for others.

If so, how will users eventually know for which tables they can expect
to reliably detect update_delete? One possibility is that users can
check which apply workers have stopped maintaining
oldest_nonremovable_xid via pg_stat_subscription view and then see the
tables corresponding to those subscriptions. Also, what will we do as
part of the resolutions in the applyworkers where
stop_conflict_info_retention is set? Shall we simply LOG that we can't
resolve this conflict and continue till the user takes some action, or
simply error out in such cases?

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Assertion failure in smgr.c when using pg_prewarm with partitioned tables
Next
From: jian he
Date:
Subject: Re: Virtual generated columns