Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Conflict detection for update_deleted in logical replication |
Date | |
Msg-id | CAA4eK1+P2vHmF+KLVQcfwuTbHrrwG9uonMKfh+c36jGZH=LXZw@mail.gmail.com Whole thread Raw |
In response to | Re: Conflict detection for update_deleted in logical replication (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Conflict detection for update_deleted in logical replication
|
List | pgsql-hackers |
On Sat, Aug 16, 2025 at 5:15 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Regarding the subscription-level option vs. GUC, I don't disagree with > the current approach. > > For the record, while I agree that the subscription-level option is > more consistent with the existing retain_dead_tuples option and can > work for different requirements, my biggest concern is that if users > set different values to different subscriptions, they might think it > doesn't work as expected. This is because even if a subscription > disables retaining dead tuples due to max_conflict_retention_duration, > the database cluster doesn't stop retaining dead tuples unless all > other subscriptions disable it, meaning that the performance on that > database might not get recovered. I proposed the GUC parameter as I > thought it's less confusing from a user perspective. I'm not sure it's > sufficient to mention that in the documentation but I don't have a > better idea. > I think we might want to gave a GUC as well in the future as both (subscription option and GUC) can be used in different scenarios but better to wait for some user feedback before going that far. We can document this in the option to make users aware how to use it in such situations. > > --- > +static void > +notify_ineffective_max_conflict_retention(bool update_maxconflretention) > +{ > + ereport(NOTICE, > + errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + update_maxconflretention > + ? errmsg("max_conflict_retention_duration has no effect > when retain_dead_tuples is disabled") > + : errmsg("disabling retain_dead_tuples will render > max_conflict_retention_duration ineffective")); > +} > > Given that max_conflict_retention_duration works only when > retain_dead_tuples is enabled, why not merge these two parameters? For > example, setting max_conflict_retention_duration=-1 means to disable > retaining dead tuples behavior and =0 means that dead tuples are > retained until they are no longer needed for detection purposes. > I think it can be a source of confusion for users, if not now, then in the future. Consider that in the future, we also have a GUC to set the retention duration which will be used for all subscriptions. Now, say, we define the behaviour such that if this value is set for subscription, then use that, otherwise, use the GUC value. Then, with this proposal, if the user sets max_conflict_retention_duration=0, it will lead to retaining tuples until they are no longer needed but with the behaviour proposed in patch, one could have simply set retain_dead_tuples=true and used the GUC value. I understand that it is debatable how we will design the GUC behaviour in future but I used it as an example how trying to achieve different things with one option can be a source of confusion. Even if we decide not to introduce GUC or define its behaviour differently, I find having different options in this case is easy to understand and use. > --- > + /* > + * Only the leader apply worker manages conflict retention (see > + * maybe_advance_nonremovable_xid() for details). > + */ > + if (!isParallelApplyWorker(&worker) && !isTablesyncWorker(&worker)) > + values[10] = TransactionIdIsValid(worker.oldest_nonremovable_xid); > + else > + nulls[10] = true; > > I think that adding a new column to the pg_stat_subscription view > should be implemented in a separate patch since it needs to bump the > catalog version while introducing max_conflict_retention_duration > subscription option doesn't require that. > Won't following change in pg_subsciption.h anyway requires us to bump catversion? @@ -81,6 +81,10 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW bool subretaindeadtuples; /* True if dead tuples useful for * conflict detection are retained */ + int32 maxconflretention; /* The maximum duration (in milliseconds) + * for which information useful for + * conflict detection can be retained */ + > --- > Even if an apply worker disables retaining dead tuples due to > max_conflict_retention_duration, it enables again after the server > restarts. > I also find this behaviour questionable because this also means that it is possible that before restart one would deduce that the update_deleted conflict won't be reliably detected for a particular subscription but after restart it could lead to the opposite conclusion. But note that to make it behave similarly we need to store this value persistently in pg_subscription unless you have better ideas for this. Theoretically, there are two places where we can persist this information, one is with pg_subscription, and other in origin. I find it is closer to pg_subscription. -- With Regards, Amit Kapila.
pgsql-hackers by date: