RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers
From | wangw.fnst@fujitsu.com |
---|---|
Subject | RE: Data is copied twice when specifying both child and parent table in publication |
Date | |
Msg-id | OS3PR01MB62753AD86A49B8A9B974F6AD9E879@OS3PR01MB6275.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Data is copied twice when specifying both child and parent table in publication (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: Data is copied twice when specifying both child and parent table in publication
Re: Data is copied twice when specifying both child and parent table in publication |
List | pgsql-hackers |
On Thu, Mar 23, 2023 at 12:27 PM Peter Smith <smithpb2250@gmail.com> wrote: > Here are some review comments for patch v20-0001. Thanks for your comments. > ====== > src/backend/commands/subscriptioncmds.c > > 3. fetch_table_list > > + /* Get the list of tables from the publisher. */ > + if (server_version >= 160000) > + { > + StringInfoData pub_names; > > - appendStringInfoString(&cmd, "FROM pg_catalog.pg_publication_tables t\n" > - " WHERE t.pubname IN ("); > - get_publications_str(publications, &cmd, true); > - appendStringInfoChar(&cmd, ')'); > + initStringInfo(&pub_names); > + get_publications_str(publications, &pub_names, true); > + > + /* > + * From version 16, we allowed passing multiple publications to the > + * function pg_get_publication_tables. This helped to filter out the > + * partition table whose ancestor is also published in this > + * publication array. > + * > + * Join pg_get_publication_tables with pg_publication to exclude > + * non-existing publications. > + */ > + appendStringInfo(&cmd, "SELECT DISTINCT N.nspname, C.relname,\n" > + " ( SELECT array_agg(a.attname ORDER BY a.attnum)\n" > + " FROM pg_attribute a\n" > + " WHERE a.attrelid = GPT.relid AND\n" > + " a.attnum = ANY(GPT.attrs)\n" > + " ) AS attnames\n" > + " FROM pg_class C\n" > + " JOIN pg_namespace N ON N.oid = C.relnamespace\n" > + " JOIN ( SELECT (pg_get_publication_tables(VARIADIC > array_agg(pubname::text))).*\n" > + " FROM pg_publication\n" > + " WHERE pubname IN ( %s )) as GPT\n" > + " ON GPT.relid = C.oid\n", > + pub_names.data); > + > + pfree(pub_names.data); > + } > + else > + { > + appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename > \n"); > + > + /* Get column lists for each relation if the publisher supports it */ > + if (check_columnlist) > + appendStringInfoString(&cmd, ", t.attnames\n"); > + > + appendStringInfoString(&cmd, "FROM pg_catalog.pg_publication_tables t\n" > + " WHERE t.pubname IN ("); > + get_publications_str(publications, &cmd, true); > + appendStringInfoChar(&cmd, ')'); > + } > > I noticed the SQL "if" part is using uppercase aliases, but the SQL in > the "else" part is using lowercase aliases. I think it would be better > to be consistent (pick one). Unified them into lowercase aliases. > ====== > src/test/subscription/t/013_partition.pl > > 4. > -# for tab4, we publish changes through the "middle" partitioned table > +# If we subscribe only to pub_lower_level, changes for tab4 will be published > +# through the "middle" partition table. However, since we will be subscribing > +# to both pub_lower_level and pub_all (see subscription sub2 below), we will > +# publish changes via the root table (tab4). > $node_publisher->safe_psql('postgres', > "CREATE PUBLICATION pub_lower_level FOR TABLE tab4_1 WITH > (publish_via_partition_root = true)" > ); > > ~ > > This comment seemed a bit overkill IMO. I don't think you need to say > much here except maybe: > > # Note that subscription "sub2" will later subscribe simultaneously to > both pub_lower_level (i.e. just table tab4_1) and pub_all. Changed. > ~~~ > > 5. > I think maybe you could have another test scenario where you INSERT > something into tab4_1_1 while only the publication for tab4_1 has > publish_via_partition_root=true I'm not sure if this scenario is necessary. > ~~~ > > 6. > AFAICT the tab4 tests are only testing the initial sync, but are not > testing normal replication. Maybe some more normal (post sync) INSERTS > are needed to tab4, tab4_1, tab4_1_1 ? Since I think the scenario we fixed is sync and not replication, it doesn't seem like we should extend the test you mentioned. > ====== > src/test/subscription/t/028_row_filter.pl > > 7. > +# insert data into partitioned table. > +$node_publisher->safe_psql('postgres', > + "INSERT INTO tab_rowfilter_viaroot_part(a) VALUES(13), (17)"); > + > $node_subscriber->safe_psql('postgres', > "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr > application_name=$appname' PUBLICATION tap_pub_1, tap_pub_2, > tap_pub_3, tap_pub_4a, tap_pub_4b, tap_pub_5a, tap_pub_5b, > tap_pub_toast, tap_pub_inherits, tap_pub_viaroot_2, tap_pub_viaroot_1" > ); > @@ -707,13 +711,17 @@ is($result, qq(t|1), 'check replicated rows to > tab_rowfilter_toast'); > # the row filter for the top-level ancestor: > # > # tab_rowfilter_viaroot_part filter is: (a > 15) > +# - INSERT (13) NO, 13 < 15 > # - INSERT (14) NO, 14 < 15 > # - INSERT (15) NO, 15 = 15 > # - INSERT (16) YES, 16 > 15 > +# - INSERT (17) YES, 17 > 15 > $result = > $node_subscriber->safe_psql('postgres', > - "SELECT a FROM tab_rowfilter_viaroot_part"); > -is($result, qq(16), 'check replicated rows to tab_rowfilter_viaroot_part'); > + "SELECT a FROM tab_rowfilter_viaroot_part ORDER BY 1"); > +is( $result, qq(16 > +17), > + 'check replicated rows to tab_rowfilter_viaroot_part'); > > ~ > > I'm not 100% sure this is testing quite what you want to test. AFAICT > the subscription is created like: > > "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr > application_name=$appname' PUBLICATION tap_pub_1, tap_pub_2, > tap_pub_3, tap_pub_4a, tap_pub_4b, tap_pub_5a, tap_pub_5b, > tap_pub_toast, tap_pub_inherits, tap_pub_viaroot_2, tap_pub_viaroot_1" I think this is the scenario we fixed : Simultaneously subscribing to two publications that publish the parent and child respectively, then want to use the parent's identity and schema). > Notice in this case BOTH the partitioned table and the partition had > been published using "WITH (publish_via_partition_root)". But, IIUC > won't it be better to test when only the partition's publication was > using that option? > > For example, I think then it would be a better test of this "At least one" code: > > /* At least one publication is using publish_via_partition_root. */ > if (pub_elem->pubviaroot) > viaroot = true; > > ====== > src/test/subscription/t/031_column_list.pl > > 8. > - CREATE PUBLICATION pub_root_true FOR TABLE test_root (a) WITH > (publish_via_partition_root = true); > + CREATE PUBLICATION pub_root_true_1 FOR TABLE test_root (a) WITH > (publish_via_partition_root = true); > + CREATE PUBLICATION pub_root_true_2 FOR TABLE test_root_1 (a, b) WITH > (publish_via_partition_root = true); > > -- initial data > INSERT INTO test_root VALUES (1, 2, 3); > INSERT INTO test_root VALUES (10, 20, 30); > )); > > +# Subscribe to pub_root_true_1 and pub_root_true_2 at the same time, which > +# means that the initial data will be synced once, and only the column list of > +# the parent table (test_root) in the publication pub_root_true_1 will be used > +# for both table sync and data replication. > $node_subscriber->safe_psql( > 'postgres', qq( > - CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION > pub_root_true; > + CREATE > > ~ > > (This is simlar to the previous review comment #7 above) > > Won't it be a better test of the "At least one" code when only the > publication of partition (test_root_1) is using "WITH > (publish_via_partition_root = true)". > > e.g > CREATE PUBLICATION pub_root_true_1 FOR TABLE test_root (a); > CREATE PUBLICATION pub_root_true_2 FOR TABLE test_root_1 (a, b) WITH > (publish_via_partition_root = true); I think specifying one or both is the same scenario here. But it seemed clearer if only the "via_root" option is specified in the publication that publishes the parent, so I changed this point in "031_column_list.pl". Since the publications in "028_row_filter.pl" were introduced by other commits, I didn't change it. Attach the new patch set. Regards, Wang Wei
Attachment
pgsql-hackers by date: