Re: bogus: logical replication rows/cols combinations - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: bogus: logical replication rows/cols combinations |
Date | |
Msg-id | CAA4eK1+9LCX3qKyo9S4uaaciM0WqZrP6m=kMzufgpuv=a=PgXQ@mail.gmail.com Whole thread Raw |
In response to | Re: bogus: logical replication rows/cols combinations (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Responses |
Re: bogus: logical replication rows/cols combinations
Re: bogus: logical replication rows/cols combinations |
List | pgsql-hackers |
On Wed, Apr 27, 2022 at 3:13 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2022-Apr-26, Tomas Vondra wrote: > > > I'm not quite sure which of the two behaviors is more "desirable". In a > > way, it's somewhat similar to publish_as_relid, which is also calculated > > not considering which of the row filters match? > > I grepped doc/src/sgml for `publish_as_relid` and found no hits, so > I suppose it's not a user-visible feature as such. > `publish_as_relid` is computed based on 'publish_via_partition_root' setting of publication which is a user-visible feature. > > But maybe you're right and it should behave the way you propose ... the > > example I have in mind is a use case replicating table with two types of > > rows - sensitive and non-sensitive. For sensitive, we replicate only > > some of the columns, for non-sensitive we replicate everything. > > Exactly. If we blindly publish row/column values that aren't in *any* > publications, this may lead to leaking protected values. > > > Changing this to behave the way you expect would be quite difficult, > > because at the moment we build a single OR expression from all the row > > filters. We'd have to keep the individual expressions, so that we can > > build a column list for each of them (in order to ignore those that > > don't match). > > I think we should do that, yeah. > This can hit the performance as we need to evaluate each expression for each row. > > I can take a stab at it, but it seems strange to not apply the same > > logic to evaluation of publish_as_relid. I wonder what Amit thinks about > > this, as he wrote the row filter stuff. > > By grepping publicationcmds.c, it seems that publish_as_relid refers to > the ancestor partitioned table that is used for column list and > rowfilter determination, when a partition is being published as part of > it. > Yeah, this is true when the corresponding publication has set 'publish_via_partition_root' as true. > I don't think these things are exactly parallel. > Currently, when the subscription has multiple publications, we combine the objects, and actions of those publications. It happens for 'publish_via_partition_root', publication actions, tables, column lists, or row filters. I think the whole design works on this idea even the initial table sync. I think it might need a major change (which I am not sure about at this stage) if we want to make the initial sync also behave similar to what you are proposing. I feel it would be much easier to create two different subscriptions as mentioned by Hou-San [1] for the case you are talking about if the user really needs something like that. > ... In fact I think they are quite orthogonal: probably you should be > able to publish a partitioned table in two publications, with different > rowfilters and different column lists (which can come from the > topmost partitioned table), and each partition should still work in the > way I describe above. > We consider the column lists or row filters for either the partition (on which the current operation is performed) or partitioned table based on 'publish_via_partition_root' parameter of publication. [1] - https://www.postgresql.org/message-id/OS0PR01MB5716B82315A067F1D78F247E94FA9%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
pgsql-hackers by date: