Thread: pgsql: Raise a warning if there is a possibility of data from multiple
Raise a warning if there is a possibility of data from multiple origins. This commit raises a warning message for a combination of options ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription operations if the publication tables were also replicated from other publishers. During replication, we can skip the data from other origins as we have that information in WAL but that is not possible during initial sync so we raise a warning if there is such a possibility. Author: Vignesh C Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei Discussion: https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3 Modified Files -------------- doc/src/sgml/ref/alter_subscription.sgml | 5 ++ doc/src/sgml/ref/create_subscription.sgml | 35 ++++++++ src/backend/commands/subscriptioncmds.c | 133 ++++++++++++++++++++++++++++-- src/test/subscription/t/030_origin.pl | 114 +++++++++++++++++++------ 4 files changed, 258 insertions(+), 29 deletions(-)
Re: pgsql: Raise a warning if there is a possibility of data from multiple
From
Masahiko Sawada
Date:
On Thu, Sep 8, 2022 at 10:39 AM Amit Kapila <akapila@postgresql.org> wrote: > > Raise a warning if there is a possibility of data from multiple origins. > > This commit raises a warning message for a combination of options > ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription > operations if the publication tables were also replicated from other > publishers. > > During replication, we can skip the data from other origins as we have that > information in WAL but that is not possible during initial sync so we raise > a warning if there is such a possibility. > > Author: Vignesh C > Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei > Discussion: https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com > > Branch > ------ > master > > Details > ------- > https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3 > > Modified Files > -------------- > doc/src/sgml/ref/alter_subscription.sgml | 5 ++ > doc/src/sgml/ref/create_subscription.sgml | 35 ++++++++ > src/backend/commands/subscriptioncmds.c | 133 ++++++++++++++++++++++++++++-- > src/test/subscription/t/030_origin.pl | 114 +++++++++++++++++++------ > 4 files changed, 258 insertions(+), 29 deletions(-) + appendStringInfoString(&cmd, + "SELECT DISTINCT P.pubname AS pubname\n" + "FROM pg_publication P,\n" + " LATERAL pg_get_publication_tables(P.pubname) GPT\n" + " JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n" + " pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n" + "WHERE C.oid = GPT.relid AND P.pubname IN ("); Should the tables and the function in this query be schema-qualified? Looking at other code in subscriptioncmd.c, we use schema-qualified names. It works even without the schema names but it may be better to make them consistent. FYI, looking at tablesync.c, there are both styles; it seems that recently added codes use schema-unqualified names. Regards, -- Masahiko Sawada
On Thu, Sep 8, 2022 at 12:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Sep 8, 2022 at 10:39 AM Amit Kapila <akapila@postgresql.org> wrote: > > > > Raise a warning if there is a possibility of data from multiple origins. > > > > This commit raises a warning message for a combination of options > > ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription > > operations if the publication tables were also replicated from other > > publishers. > > > > During replication, we can skip the data from other origins as we have that > > information in WAL but that is not possible during initial sync so we raise > > a warning if there is such a possibility. > > > > Author: Vignesh C > > Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei > > Discussion: https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com > > > > Branch > > ------ > > master > > > > Details > > ------- > > https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3 > > > > Modified Files > > -------------- > > doc/src/sgml/ref/alter_subscription.sgml | 5 ++ > > doc/src/sgml/ref/create_subscription.sgml | 35 ++++++++ > > src/backend/commands/subscriptioncmds.c | 133 ++++++++++++++++++++++++++++-- > > src/test/subscription/t/030_origin.pl | 114 +++++++++++++++++++------ > > 4 files changed, 258 insertions(+), 29 deletions(-) > > + appendStringInfoString(&cmd, > + "SELECT DISTINCT > P.pubname AS pubname\n" > + "FROM pg_publication P,\n" > + " LATERAL > pg_get_publication_tables(P.pubname) GPT\n" > + " JOIN > pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n" > + " pg_class C > JOIN pg_namespace N ON (N.oid = C.relnamespace)\n" > + "WHERE C.oid = > GPT.relid AND P.pubname IN ("); > > Should the tables and the function in this query be schema-qualified? > Looking at other code in subscriptioncmd.c, we use schema-qualified > names. It works even without the schema names but it may be better to > make them consistent. > > FYI, looking at tablesync.c, there are both styles; it seems that > recently added codes use schema-unqualified names. > Yeah, it is better to be consistent in all places and add a schema name before accessing an object rather than for one or two places. Do we need similar handling for pg_dump code as well, see getPublications()? -- With Regards, Amit Kapila.
On Thu, 8 Sept 2022 at 12:22, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Sep 8, 2022 at 10:39 AM Amit Kapila <akapila@postgresql.org> wrote: > > > > Raise a warning if there is a possibility of data from multiple origins. > > > > This commit raises a warning message for a combination of options > > ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription > > operations if the publication tables were also replicated from other > > publishers. > > > > During replication, we can skip the data from other origins as we have that > > information in WAL but that is not possible during initial sync so we raise > > a warning if there is such a possibility. > > > > Author: Vignesh C > > Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei > > Discussion: https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com > > > > Branch > > ------ > > master > > > > Details > > ------- > > https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3 > > > > Modified Files > > -------------- > > doc/src/sgml/ref/alter_subscription.sgml | 5 ++ > > doc/src/sgml/ref/create_subscription.sgml | 35 ++++++++ > > src/backend/commands/subscriptioncmds.c | 133 ++++++++++++++++++++++++++++-- > > src/test/subscription/t/030_origin.pl | 114 +++++++++++++++++++------ > > 4 files changed, 258 insertions(+), 29 deletions(-) > > + appendStringInfoString(&cmd, > + "SELECT DISTINCT > P.pubname AS pubname\n" > + "FROM pg_publication P,\n" > + " LATERAL > pg_get_publication_tables(P.pubname) GPT\n" > + " JOIN > pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n" > + " pg_class C > JOIN pg_namespace N ON (N.oid = C.relnamespace)\n" > + "WHERE C.oid = > GPT.relid AND P.pubname IN ("); > > Should the tables and the function in this query be schema-qualified? > Looking at other code in subscriptioncmd.c, we use schema-qualified > names. It works even without the schema names but it may be better to > make them consistent. > > FYI, looking at tablesync.c, there are both styles; it seems that > recently added codes use schema-unqualified names. There will be no issues as such because we set search_path to always-secure search path while creating connection (libpqrcv_connect-> libpqrcv_PQexec(conn->streamConn, ALWAYS_SECURE_SEARCH_PATH_SQL). But I agree it will be better to keep it consistent across all places. Regards, Vignesh
Re: pgsql: Raise a warning if there is a possibility of data from multiple
From
Masahiko Sawada
Date:
On Thu, Sep 8, 2022 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Sep 8, 2022 at 12:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Sep 8, 2022 at 10:39 AM Amit Kapila <akapila@postgresql.org> wrote: > > > > > > Raise a warning if there is a possibility of data from multiple origins. > > > > > > This commit raises a warning message for a combination of options > > > ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription > > > operations if the publication tables were also replicated from other > > > publishers. > > > > > > During replication, we can skip the data from other origins as we have that > > > information in WAL but that is not possible during initial sync so we raise > > > a warning if there is such a possibility. > > > > > > Author: Vignesh C > > > Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei > > > Discussion: https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com > > > > > > Branch > > > ------ > > > master > > > > > > Details > > > ------- > > > https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3 > > > > > > Modified Files > > > -------------- > > > doc/src/sgml/ref/alter_subscription.sgml | 5 ++ > > > doc/src/sgml/ref/create_subscription.sgml | 35 ++++++++ > > > src/backend/commands/subscriptioncmds.c | 133 ++++++++++++++++++++++++++++-- > > > src/test/subscription/t/030_origin.pl | 114 +++++++++++++++++++------ > > > 4 files changed, 258 insertions(+), 29 deletions(-) > > > > + appendStringInfoString(&cmd, > > + "SELECT DISTINCT > > P.pubname AS pubname\n" > > + "FROM pg_publication P,\n" > > + " LATERAL > > pg_get_publication_tables(P.pubname) GPT\n" > > + " JOIN > > pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n" > > + " pg_class C > > JOIN pg_namespace N ON (N.oid = C.relnamespace)\n" > > + "WHERE C.oid = > > GPT.relid AND P.pubname IN ("); > > > > Should the tables and the function in this query be schema-qualified? > > Looking at other code in subscriptioncmd.c, we use schema-qualified > > names. It works even without the schema names but it may be better to > > make them consistent. > > > > FYI, looking at tablesync.c, there are both styles; it seems that > > recently added codes use schema-unqualified names. > > > > Yeah, it is better to be consistent in all places and add a schema > name before accessing an object rather than for one or two places. Do > we need similar handling for pg_dump code as well, see > getPublications()? We can fix at least getPublications() and getSubscriptions() as well. I'll make a patch for that. Or do you want to fix all of them, including non-logical-replication-related ones? Regards, -- Masahiko Sawada
On Thu, Sep 8, 2022 at 2:38 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Sep 8, 2022 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Sep 8, 2022 at 12:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > Should the tables and the function in this query be schema-qualified? > > > Looking at other code in subscriptioncmd.c, we use schema-qualified > > > names. It works even without the schema names but it may be better to > > > make them consistent. > > > > > > FYI, looking at tablesync.c, there are both styles; it seems that > > > recently added codes use schema-unqualified names. > > > > > > > Yeah, it is better to be consistent in all places and add a schema > > name before accessing an object rather than for one or two places. Do > > we need similar handling for pg_dump code as well, see > > getPublications()? > > We can fix at least getPublications() and getSubscriptions() as well. > I'll make a patch for that. Thanks. > Or do you want to fix all of them, > including non-logical-replication-related ones? > I feel it is better to be consistent across the entire code base unless there is a reason for doing it differently. Does anyone else have any thoughts on this matter? -- With Regards, Amit Kapila.
Amit Kapila <amit.kapila16@gmail.com> writes: > I feel it is better to be consistent across the entire code base > unless there is a reason for doing it differently. Does anyone else > have any thoughts on this matter? That sounds like it would be a pretty massive and unnecessary patch. regards, tom lane
On Thu, Sep 8, 2022 at 7:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > I feel it is better to be consistent across the entire code base > > unless there is a reason for doing it differently. Does anyone else > > have any thoughts on this matter? > > That sounds like it would be a pretty massive and unnecessary patch. > Fair enough. Do you mind being consistent in this regard for logical replication-related code? BTW, is there a reason we prefer to write in one or another way (with or without appending schema_name)? -- With Regards, Amit Kapila.
Amit Kapila <amit.kapila16@gmail.com> writes: > Fair enough. Do you mind being consistent in this regard for logical > replication-related code? As long as not too many individual changes are involved, sure. But consistency of this sort doesn't seem worth creating a lot of back-patching land mines, IMO. > BTW, is there a reason we prefer to write in > one or another way (with or without appending schema_name)? In places where we can trust the search_path to be just pg_catalog, there's no real strong reason to write a schema name, I think. regards, tom lane