On Fri, Sep 12, 2025 at 8:55 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
>
> I agree. Here is a V73 patch that will restart the worker if the retention
> resumes. I also addressed other comments posted by Amit[1].
>
Thanks for the patch. Few comments:
1)
There is a small window where worker can exit while resuming
retention and launcher can end up acessign stale worker info.
Lets say launcher is at a stage where it has fetched worker:
w = logicalrep_worker_find(sub->oid, InvalidOid, false);
And after this point, before the launcher could do
compute_min_nonremovable_xid(), the worker has stopped retention and
resumed as well. Now the worker has exited but in
compute_min_nonremovable_xid(), launcher will still use the
worker-info fetched previously.
2)
if (should_stop_conflict_info_retention(rdt_data))
+ {
+ /*
+ * Stop retention if not yet. Otherwise, reset to the initial phase to
+ * retry all phases. This is required to recalculate the current wait
+ * time and resume retention if the time falls within
+ * max_retention_duration.
+ */
+ if (MySubscription->retentionactive)
+ rdt_data->phase = RDT_STOP_CONFLICT_INFO_RETENTION;
+ else
+ reset_retention_data_fields(rdt_data);
+
return;
+ }
Shall we have an Assert( !MyLogicalRepWorker->oldest_nonremovable_xid)
in 'else' part above?
thanks
Shveta