Re: bogus: logical replication rows/cols combinations - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: bogus: logical replication rows/cols combinations |
Date | |
Msg-id | 765985d7-c464-2a73-6acf-268b341173aa@enterprisedb.com Whole thread Raw |
In response to | Re: bogus: logical replication rows/cols combinations (Peter Eisentraut <peter.eisentraut@enterprisedb.com>) |
Responses |
Re: bogus: logical replication rows/cols combinations
|
List | pgsql-hackers |
On 4/28/22 14:26, Peter Eisentraut wrote: > On 27.04.22 12:33, Amit Kapila wrote: >> Currently, when the subscription has multiple publications, we combine >> the objects, and actions of those publications. It happens for >> 'publish_via_partition_root', publication actions, tables, column >> lists, or row filters. I think the whole design works on this idea >> even the initial table sync. I think it might need a major change >> (which I am not sure about at this stage) if we want to make the >> initial sync also behave similar to what you are proposing. > > If one publication says "publish if insert" and another publication says > "publish if update", then the combination of that is clearly "publish if > insert or update". Similarly, if one publication says "WHERE (foo)" and > one says "WHERE (bar)", then the combination is "WHERE (foo OR bar)". > > But if one publication says "publish columns a and b if condition-X" and > another publication says "publish columns a and c if not-condition-X", > then the combination is clearly *not* "publish columns a, b, c if true". > That is not logical, in the literal sense of that word. > > I wonder how we handle the combination of > > pub1: publish=insert WHERE (foo) > pub2: publish=update WHERE (bar) > > I think it would be incorrect if the combination is > > pub1, pub2: publish=insert,update WHERE (foo OR bar). That's a good question, actually. No, we don't combine the publications like this, the row filters are kept "per action". But the exact behavior turns out to be rather confusing in this case. (Note: This has nothing to do with column lists.) Consider an example similar to what Alvaro posted earlier: create table uno (a int primary key, b int, c int); create publication uno for table uno where (a > 0) with (publish='insert'); create publication dos for table uno where (a < 0) with (publish='update'); And do this: insert into uno values (1, 2, 3), (-1, 3, 4) which on the subscriber produces just one row, because (a<0) replicates only updates: a | b | c ---+---+--- 1 | 2 | 3 (1 row) Now, let's update the (a<0) row. update uno set a = 2 where a = -1; It might seem reasonable to expect the updated row (2,3,4) to appear on the subscriber, but no - that's not what happens. Because we have (a<0) for UPDATE, and we evaluate this on the old row (matches) and new row (does not match). And pgoutput_row_filter() decides the update needs to be converted to DELETE, despite the old row was not replicated at all. I'm not sure if pgoutput_row_filter() can even make reasonable decisions with such configuration (combination of row filters, actions ...). But it sure seems confusing, because if you just inserted the updated row, it would get replicated. Which brings me to a second problem, related to this one. Imagine you create the subscription *after* inserting the two rows. In that case you get this: a | b | c ----+---+--- 1 | 2 | 3 -1 | 3 | 4 (2 rows) because tablesync.c ignores which actions is the publication (and thus the rowfilter) defined for. I think it's natural to expect that (INSERT + sync) and (sync + INSERT) produce the same output on the subscriber. I'm not sure we can actually make this perfectly sane with arbitrary combinations of filters and actions. It would probably depend on whether the actions are commutative, associative and stuff like that. But maybe we can come up with restrictions that'd make this sane? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: