Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Conflict detection for update_deleted in logical replication |
Date | |
Msg-id | CAD21AoAaJG-TbvqNkcUHdQ45Wd4ajGz3DCifKJ-a-+awxJ1Hzw@mail.gmail.com Whole thread Raw |
In response to | Re: Conflict detection for update_deleted in logical replication (Dilip Kumar <dilipbalaut@gmail.com>) |
List | pgsql-hackers |
On Fri, Jul 4, 2025 at 8:18 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Wed, Jul 2, 2025 at 3:28 PM Hou, Zhijie wrote: > > Kindly use the latest patch set for performance testing. > > During testing, we observed a limitation in cascading logical replication > setups, such as (A -> B -> C). When retain_conflict_info is enabled on Node C, > it may not retain information necessary for conflict detection when applying > changes originally replicated from Node A. This happens because Node C only > waits for locally originated changes on Node B to be applied before advancing > the non-removable transaction ID. > > For example, Consider a logical replication setup as mentioned above : A -> B -> C. > - All three nodes have a table t1 with two tuples (1,1) (2,2). > - Node B subscribed to all changes of t1 from Node A > - Node-C subscribed to all changes from Node B. > - Subscriptions use the default origin=ANY, as this is not a bidirectional > setup. > > Now, consider two concurrent operations: > - @9:00 Node A - UPDATE (1,1) -> (1,11) > > - @9:02 Node C - DELETE (1,1) > > Assume a slight delay at Node B before it applies the update from Node A. > > @9:03 Node C - advances the non-removable XID because it sees no concurrent > transactions from Node B. It is unaware of Node A’s concurrent update. > > @9:04 Node B - receives Node A's UPDATE and applies (1,1) -> (1,11) > t1 has tuples : (1,11), (2,2) > > @9:05 Node C - receives the UPDATE (1,1) -> (1,11) > - As conflict slot’s xmin is advanced, the deleted tuple may already have > been removed. > - Conflict resolution fails to detect update_deleted and instead raises > update_missing. > > Note that, as per decoding logic Node C sees the commit timestamp of the update > as 9:00 (origin commit_ts from Node A), not 9:04 (commit time on Node B). In > this case, since the UPDATE's timestamp is earlier than the DELETE, Node C > should ideally detect an update_deleted conflict. However, it cannot, because > it no longer retains the deleted tuple. > > Even if Node C attempts to retrieve the latest WAL position from Node A, Node C > doesn't maintain any LSN which we could use to compare with it. > > This scenario is similar to another restriction in the patch where > retain_conflict_info is not supported if the publisher is also a physical > standby, as the required transaction information from the original primary is > unavailable. Moreover, this limitation is relevant only when the subscription > origin option is set to ANY, as only in that case changes from other origins > can be replicated. Since retain_conflict_info is primarily useful for conflict > detection in bidirectional clusters where the origin option is set to NONE, > this limitation appears acceptable. > > Given these findings, to help users avoid unintended configurations, we plan to > issue a warning in scenarios where replicated changes may include origins other > than the direct publisher, similar to the existing checks in the > check_publications_origin() function. > > Here is the latest patch that implements the warning and documents > this case. Only 0001 is modified for this. > > A big thanks to Nisha for invaluable assistance in identifying this > case and preparing the analysis for it. I'm still reviewing the 0001 patch but let me share some comments and questions I have so far: --- It seems there is no place where we describe the overall idea of reliably detecting update_deleted conflicts. The comment atop maybe_advance_nonremovable_xid() describes why the implemented algorithm works for that purpose but doesn't how it is implemented, for example the relationship with pg_conflict_detection slot. I'm not sure the long comment atop maybe_advance_nonremovable_xid() is the right place as it seems to be a description beyond explaining maybe_advance_nonremovable_xid() function. Probably we can move that comment and explain the overall idea somewhere for example atop worker.c or in README. --- The new parameter name "retain_conflict_info" sounds to me like we keep the conflict information somewhere that has expired at some time such as how many times insert_exists or update_origin_differs happened. How about choosing a name that indicates retain dead tuples more explicitly for example retain_dead_tuples? --- You mentioned in the previous email: > Furthermore, we tested running pgbench on both publisher and subscriber[3]. > Some regression was observed in TPS on the subscriber, because workload on the > publisher is pretty high and the apply workers must wait for the amount of > transactions with earlier timestamps to be applied and flushed before advancing > the non-removable XID to remove dead tuples. This is the expected behavior of > this approach since the patch's main goal is to retain dead tuples for reliable > conflict detection. Have you conducted any performance testing of a scenario where a publisher replicates a large number of databases (say 64) to a subscriber? I'm particularly interested in a configuration where retain_conflict_info is set to true, and there are 64 apply workers running on the subscriber side. In such a setup, even when running pgbench exclusively on the publisher's databases, I suspect the replication lag would likely increase quickly, as all apply workers on the subscriber would be impacted by the overhead of retaining dead tuples. --- @@ -71,8 +72,9 @@ #define SUBOPT_PASSWORD_REQUIRED 0x00000800 #define SUBOPT_RUN_AS_OWNER 0x00001000 #define SUBOPT_FAILOVER 0x00002000 -#define SUBOPT_LSN 0x00004000 -#define SUBOPT_ORIGIN 0x00008000 +#define SUBOPT_RETAIN_CONFLICT_INFO 0x00004000 +#define SUBOPT_LSN 0x00008000 +#define SUBOPT_ORIGIN 0x00010000 Why do we need to change the existing options' value? --- + * This is required to ensure that we don't advance the xmin + * of CONFLICT_DETECTION_SLOT even if one of the subscriptions + * is not enabled. Otherwise, we won't be able to detect I guess the "even" in the first sentence is not necessary. --- +/* + * Determine the minimum non-removable transaction ID across all apply workers + * for subscriptions that have retain_conflict_info enabled. Store the result + * in *xmin. + * + * If the replication slot cannot be advanced during this cycle, due to either + * a disabled subscription or an inactive worker, set *can_advance_xmin to + * false. + */ +static void +compute_min_nonremovable_xid(LogicalRepWorker *worker, + bool retain_conflict_info, TransactionId *xmin, + bool *can_advance_xmin) I think this function is quite confusing for several reasons. For instance, it's doing more things than described in the comments such as trying to create the CONFLICT_DETECTION_SLOT if no worker is passed. Also, one of the caller describes: + /* + * This is required to ensure that we don't advance the xmin + * of CONFLICT_DETECTION_SLOT even if one of the subscriptions + * is not enabled. Otherwise, we won't be able to detect + * conflicts reliably for such a subscription even though it + * has set the retain_conflict_info option. + */ + compute_min_nonremovable_xid(NULL, sub->retainconflictinfo, + &xmin, &can_advance_xmin); but it's unclear to me from the function name that it tries to create the replication slot. Furthermore, in this path it doesn't actually compute xmin. I guess we can try to create CONFLICT_DETECTION_SLOT in the loop of "foreach(lc, sublist)" and set false to can_advance_xmin if either the subscription is disabled or the worker is not running. --- + FullTransactionId remote_oldestxid; /* oldest transaction ID that was in + * the commit phase on the publisher. + * Use FullTransactionId to prevent + * issues with transaction ID + * wraparound, where a new + * remote_oldestxid could falsely + * appear to originate from the past + * and block advancement */ + FullTransactionId remote_nextxid; /* next transaction ID to be assigned + * on the publisher. Use + * FullTransactionId for consistency + * and to allow straightforward + * comparisons with remote_oldestxid. */ I think it would be readable if we could write them above each field. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: