Home > mailing lists

Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: Conflict detection for update_deleted in logical replication
Date	August 16 02:44:23
Msg-id	CAD21AoDH5x-Wt=x=xrrrD99+BYp30fwH0FHaxx6HvEfcPrxNxA@mail.gmail.com Whole thread Raw
In response to	RE: Conflict detection for update_deleted in logical replication ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
List	pgsql-hackers

Tree view

On Thu, Aug 14, 2025 at 10:46 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Thursday, August 14, 2025 11:46 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Wed, Aug 13, 2025 at 4:15 PM shveta malik <shveta.malik@gmail.com>
> > wrote:
> > >
> > > On Wed, Aug 13, 2025 at 10:41 AM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > >
> > > > Here is the V61 patch set which addressed above comments and the
> > comment by Nisha[2].
> > > >
> > >
> > > Thank You for the patch. I tested the patch, please find a few comments:
> > >
> > > 1)
> > > Now when it stops-retention and later resumes it due to the fact that
> > > max_duration is meanwhile altered to 0, I get log:
> > >
> > > LOG:  logical replication worker for subscription "sub1" resumes
> > > retaining the information for detecting conflicts
> > > DETAIL:  The time spent applying changes up to LSN 0/17DD728 is now
> > > within the maximum limit of 0 ms.
> > >
> > > I did not get which lsn it is pointing to? Is it some dangling lsn
> > > from when it was retaining info?  Also the msg looks odd, when it says
> > > 'is now within the maximum limit of 0 ms.'
> > >
> > > 2)
> > > While stopping the message is:
> > > LOG:  logical replication worker for subscription "sub1" will stop
> > > retaining conflict information
> > > DETAIL:  The time spent advancing the non-removable transaction ID has
> > > exceeded the maximum limit of 1000 ms.
> > >
> > > And while resuming:
> > > logical replication worker for subscription "sub1" resumes retaining
> > > the information for detecting conflicts
> > > ----------
> > >
> > > We can make both similar. Both can have 'retaining the information for
> > > detecting conflicts' instead of 'conflict information' in first one.
> > >
> > > 3)
> > > I believe the tenses should also be updated. When stopping, we can say:
> > >
> > > Logical replication worker for subscription "sub1" has stopped...
> > >
> > > This is appropriate because it has already stopped by pre-setting
> > > oldest_nonremovable_xid to Invalid.
> > >
> > > When resuming, we can say:
> > > Logical replication worker for subscription "sub1" will resume...
> > >
> > > This is because it will begin resuming from the next cycle onward,
> > > specifically after the launcher sets its oldest_xid.
> > >
> > > 4)
> > > For the DETAIL part of resume and stop messages, how about these:
> > >
> > > The retention duration for information used in conflict detection has
> > > exceeded the limit of xx.
> > > The retention duration for information used in conflict detection is
> > > now within the acceptable limit of xx.
> > > The retention duration for information used in conflict detection is
> > > now indefinite.
> > >
>
> Thanks for the comments, I have adjusted the log messages
> according to the suggestions.
>
>
> >
> > 5)
> > Say there 2-3 subs, all have stopped-retention and the slot is set to have invalid
> > xmin; now if I  create a new sub, it will start with stopped-flag set to true due to
> > the fact that slot has invalid xmin to begin with. But then immediately, it will
> > dump a resume message. It looks odd, as at first, it has not even stopped, as it
> > is a new sub.
> > Is there anything we can do to improve this situation?
>
> I changed the logic to recovery the slot immediately on starting a new worker
> that has retain_dead_tuples enabled.
>
> Here is the V62 patch set which addressed above comments and [1].
>

Here are review comments on v62 patch:

Regarding the subscription-level option vs. GUC, I don't disagree with
the current approach.

For the record, while I agree that the subscription-level option is
more consistent with the existing retain_dead_tuples option and can
work for different requirements, my biggest concern is that if users
set different values to different subscriptions, they might think it
doesn't work as expected. This is because even if a subscription
disables retaining dead tuples due to max_conflict_retention_duration,
the database cluster doesn't stop retaining dead tuples unless all
other subscriptions disable it, meaning that the performance on that
database might not get recovered. I proposed the GUC parameter as I
thought it's less confusing from a user perspective. I'm not sure it's
sufficient to mention that in the documentation but I don't have a
better idea.

---
+                else if (IsSet(supported_opts,
SUBOPT_MAX_CONFLICT_RETENTION_DURATION) &&
+                                 strcmp(defel->defname,
"max_conflict_retention_duration") == 0)
+                {
+                        if (IsSet(opts->specified_opts,
SUBOPT_MAX_CONFLICT_RETENTION_DURATION))
+                                errorConflictingDefElem(defel, pstate);
+
+                        opts->specified_opts |=
SUBOPT_MAX_CONFLICT_RETENTION_DURATION;
+                        opts->maxconflretention = defGetInt32(defel);
+                }

The new subscription parameter accepts only integers and takes it as
milliseconds, but I think it would be relatively rare that users
specify this parameter to less than 1 second. I guess it would be a
good idea to accept string representation of a duration too such as
'10 min' like we do for parsing GUC parameter values.

---
+static void
+notify_ineffective_max_conflict_retention(bool update_maxconflretention)
+{
+   ereport(NOTICE,
+           errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+           update_maxconflretention
+           ? errmsg("max_conflict_retention_duration has no effect
when retain_dead_tuples is disabled")
+           : errmsg("disabling retain_dead_tuples will render
max_conflict_retention_duration ineffective"));
+}

Given that max_conflict_retention_duration works only when
retain_dead_tuples is enabled, why not merge these two parameters? For
example, setting max_conflict_retention_duration=-1 means to disable
retaining dead tuples behavior and =0 means that dead tuples are
retained until they are no longer needed for detection purposes.

---
+       /*
+        * Only the leader apply worker manages conflict retention (see
+        * maybe_advance_nonremovable_xid() for details).
+        */
+       if (!isParallelApplyWorker(&worker) && !isTablesyncWorker(&worker))
+           values[10] = TransactionIdIsValid(worker.oldest_nonremovable_xid);
+       else
+           nulls[10] = true;

I think that adding a new column to the pg_stat_subscription view
should be implemented in a separate patch since it needs to bump the
catalog version while introducing max_conflict_retention_duration
subscription option doesn't require that.

---
Even if an apply worker disables retaining dead tuples due to
max_conflict_retention_duration, it enables again after the server
restarts. I guess restarting a server doesn't necessarily mean that
the subscription caught up to the publisher and can resume retaining
dead tuples but is this behavior intentional?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Andrew Jackson
Date: 16 August, 02:43:20
Subject: Re: Add Option To Check All Addresses For Matching target_session_attr

From: Jeff Davis
Date: 16 August, 03:13:06
Subject: executor: release large memory allocations sooner

Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

Previous

Next