Thread: pgsql: Raise a warning if there is a possibility of data from multiple

pgsql: Raise a warning if there is a possibility of data from multiple

From

Amit Kapila

Date:

08 September 2022, 01:39:38

Raise a warning if there is a possibility of data from multiple origins.

This commit raises a warning message for a combination of options
('copy_data = true' and 'origin = none') during CREATE/ALTER subscription
operations if the publication tables were also replicated from other
publishers.

During replication, we can skip the data from other origins as we have that
information in WAL but that is not possible during initial sync so we raise
a warning if there is such a possibility.

Author: Vignesh C
Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei
Discussion: https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3

Modified Files
--------------
doc/src/sgml/ref/alter_subscription.sgml  |   5 ++
doc/src/sgml/ref/create_subscription.sgml |  35 ++++++++
src/backend/commands/subscriptioncmds.c   | 133 ++++++++++++++++++++++++++++--
src/test/subscription/t/030_origin.pl     | 114 +++++++++++++++++++------
4 files changed, 258 insertions(+), 29 deletions(-)

Re: pgsql: Raise a warning if there is a possibility of data from multiple

From

Masahiko Sawada

Date:

08 September 2022, 06:52:06

On Thu, Sep 8, 2022 at 10:39 AM Amit Kapila <akapila@postgresql.org> wrote:
>
> Raise a warning if there is a possibility of data from multiple origins.
>
> This commit raises a warning message for a combination of options
> ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription
> operations if the publication tables were also replicated from other
> publishers.
>
> During replication, we can skip the data from other origins as we have that
> information in WAL but that is not possible during initial sync so we raise
> a warning if there is such a possibility.
>
> Author: Vignesh C
> Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei
> Discussion: https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com
>
> Branch
> ------
> master
>
> Details
> -------
> https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3
>
> Modified Files
> --------------
> doc/src/sgml/ref/alter_subscription.sgml  |   5 ++
> doc/src/sgml/ref/create_subscription.sgml |  35 ++++++++
> src/backend/commands/subscriptioncmds.c   | 133 ++++++++++++++++++++++++++++--
> src/test/subscription/t/030_origin.pl     | 114 +++++++++++++++++++------
> 4 files changed, 258 insertions(+), 29 deletions(-)

+       appendStringInfoString(&cmd,
+                                                  "SELECT DISTINCT
P.pubname AS pubname\n"
+                                                  "FROM pg_publication P,\n"
+                                                  "     LATERAL
pg_get_publication_tables(P.pubname) GPT\n"
+                                                  "     JOIN
pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+                                                  "     pg_class C
JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
+                                                  "WHERE C.oid =
GPT.relid AND P.pubname IN (");

Should the tables and the function in this query be schema-qualified?
Looking at other code in subscriptioncmd.c, we use schema-qualified
names. It works even without the schema names but it may be better to
make them consistent.

FYI, looking at tablesync.c, there are both styles; it seems that
recently added codes use schema-unqualified names.

Regards,

--
Masahiko Sawada

Re: pgsql: Raise a warning if there is a possibility of data from multiple

From

Amit Kapila

Date:

08 September 2022, 07:22:57

On Thu, Sep 8, 2022 at 12:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Sep 8, 2022 at 10:39 AM Amit Kapila <akapila@postgresql.org> wrote:
> >
> > Raise a warning if there is a possibility of data from multiple origins.
> >
> > This commit raises a warning message for a combination of options
> > ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription
> > operations if the publication tables were also replicated from other
> > publishers.
> >
> > During replication, we can skip the data from other origins as we have that
> > information in WAL but that is not possible during initial sync so we raise
> > a warning if there is such a possibility.
> >
> > Author: Vignesh C
> > Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei
> > Discussion:
https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com
> >
> > Branch
> > ------
> > master
> >
> > Details
> > -------
> > https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3
> >
> > Modified Files
> > --------------
> > doc/src/sgml/ref/alter_subscription.sgml  |   5 ++
> > doc/src/sgml/ref/create_subscription.sgml |  35 ++++++++
> > src/backend/commands/subscriptioncmds.c   | 133 ++++++++++++++++++++++++++++--
> > src/test/subscription/t/030_origin.pl     | 114 +++++++++++++++++++------
> > 4 files changed, 258 insertions(+), 29 deletions(-)
>
> +       appendStringInfoString(&cmd,
> +                                                  "SELECT DISTINCT
> P.pubname AS pubname\n"
> +                                                  "FROM pg_publication P,\n"
> +                                                  "     LATERAL
> pg_get_publication_tables(P.pubname) GPT\n"
> +                                                  "     JOIN
> pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
> +                                                  "     pg_class C
> JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
> +                                                  "WHERE C.oid =
> GPT.relid AND P.pubname IN (");
>
> Should the tables and the function in this query be schema-qualified?
> Looking at other code in subscriptioncmd.c, we use schema-qualified
> names. It works even without the schema names but it may be better to
> make them consistent.
>
> FYI, looking at tablesync.c, there are both styles; it seems that
> recently added codes use schema-unqualified names.
>

Yeah, it is better to be consistent in all places and add a schema
name before accessing an object rather than for one or two places. Do
we need similar handling for pg_dump code as well, see
getPublications()?

-- 
With Regards,
Amit Kapila.

Re: pgsql: Raise a warning if there is a possibility of data from multiple

From

vignesh C

Date:

08 September 2022, 07:59:07

On Thu, 8 Sept 2022 at 12:22, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Sep 8, 2022 at 10:39 AM Amit Kapila <akapila@postgresql.org> wrote:
> >
> > Raise a warning if there is a possibility of data from multiple origins.
> >
> > This commit raises a warning message for a combination of options
> > ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription
> > operations if the publication tables were also replicated from other
> > publishers.
> >
> > During replication, we can skip the data from other origins as we have that
> > information in WAL but that is not possible during initial sync so we raise
> > a warning if there is such a possibility.
> >
> > Author: Vignesh C
> > Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei
> > Discussion:
https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com
> >
> > Branch
> > ------
> > master
> >
> > Details
> > -------
> > https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3
> >
> > Modified Files
> > --------------
> > doc/src/sgml/ref/alter_subscription.sgml  |   5 ++
> > doc/src/sgml/ref/create_subscription.sgml |  35 ++++++++
> > src/backend/commands/subscriptioncmds.c   | 133 ++++++++++++++++++++++++++++--
> > src/test/subscription/t/030_origin.pl     | 114 +++++++++++++++++++------
> > 4 files changed, 258 insertions(+), 29 deletions(-)
>
> +       appendStringInfoString(&cmd,
> +                                                  "SELECT DISTINCT
> P.pubname AS pubname\n"
> +                                                  "FROM pg_publication P,\n"
> +                                                  "     LATERAL
> pg_get_publication_tables(P.pubname) GPT\n"
> +                                                  "     JOIN
> pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
> +                                                  "     pg_class C
> JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
> +                                                  "WHERE C.oid =
> GPT.relid AND P.pubname IN (");
>
> Should the tables and the function in this query be schema-qualified?
> Looking at other code in subscriptioncmd.c, we use schema-qualified
> names. It works even without the schema names but it may be better to
> make them consistent.
>
> FYI, looking at tablesync.c, there are both styles; it seems that
> recently added codes use schema-unqualified names.

There will be no issues as such because we set search_path to
always-secure search path while creating connection
(libpqrcv_connect-> libpqrcv_PQexec(conn->streamConn,
ALWAYS_SECURE_SEARCH_PATH_SQL). But I agree it will be better to keep
it consistent across all places.

Regards,
Vignesh

Re: pgsql: Raise a warning if there is a possibility of data from multiple

From

Masahiko Sawada

Date:

08 September 2022, 09:07:32

On Thu, Sep 8, 2022 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Sep 8, 2022 at 12:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Sep 8, 2022 at 10:39 AM Amit Kapila <akapila@postgresql.org> wrote:
> > >
> > > Raise a warning if there is a possibility of data from multiple origins.
> > >
> > > This commit raises a warning message for a combination of options
> > > ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription
> > > operations if the publication tables were also replicated from other
> > > publishers.
> > >
> > > During replication, we can skip the data from other origins as we have that
> > > information in WAL but that is not possible during initial sync so we raise
> > > a warning if there is such a possibility.
> > >
> > > Author: Vignesh C
> > > Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei
> > > Discussion:
https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com
> > >
> > > Branch
> > > ------
> > > master
> > >
> > > Details
> > > -------
> > > https://git.postgresql.org/pg/commitdiff/875693019053b8897ec3983e292acbb439b088c3
> > >
> > > Modified Files
> > > --------------
> > > doc/src/sgml/ref/alter_subscription.sgml  |   5 ++
> > > doc/src/sgml/ref/create_subscription.sgml |  35 ++++++++
> > > src/backend/commands/subscriptioncmds.c   | 133 ++++++++++++++++++++++++++++--
> > > src/test/subscription/t/030_origin.pl     | 114 +++++++++++++++++++------
> > > 4 files changed, 258 insertions(+), 29 deletions(-)
> >
> > +       appendStringInfoString(&cmd,
> > +                                                  "SELECT DISTINCT
> > P.pubname AS pubname\n"
> > +                                                  "FROM pg_publication P,\n"
> > +                                                  "     LATERAL
> > pg_get_publication_tables(P.pubname) GPT\n"
> > +                                                  "     JOIN
> > pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
> > +                                                  "     pg_class C
> > JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
> > +                                                  "WHERE C.oid =
> > GPT.relid AND P.pubname IN (");
> >
> > Should the tables and the function in this query be schema-qualified?
> > Looking at other code in subscriptioncmd.c, we use schema-qualified
> > names. It works even without the schema names but it may be better to
> > make them consistent.
> >
> > FYI, looking at tablesync.c, there are both styles; it seems that
> > recently added codes use schema-unqualified names.
> >
>
> Yeah, it is better to be consistent in all places and add a schema
> name before accessing an object rather than for one or two places. Do
> we need similar handling for pg_dump code as well, see
> getPublications()?

We can fix at least getPublications() and getSubscriptions() as well.
I'll make a patch for that. Or do you want to fix all of them,
including non-logical-replication-related ones?

Regards,

-- 
Masahiko Sawada

Re: pgsql: Raise a warning if there is a possibility of data from multiple

From

Amit Kapila

Date:

08 September 2022, 09:21:14

On Thu, Sep 8, 2022 at 2:38 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Sep 8, 2022 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Sep 8, 2022 at 12:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > >
> > > Should the tables and the function in this query be schema-qualified?
> > > Looking at other code in subscriptioncmd.c, we use schema-qualified
> > > names. It works even without the schema names but it may be better to
> > > make them consistent.
> > >
> > > FYI, looking at tablesync.c, there are both styles; it seems that
> > > recently added codes use schema-unqualified names.
> > >
> >
> > Yeah, it is better to be consistent in all places and add a schema
> > name before accessing an object rather than for one or two places. Do
> > we need similar handling for pg_dump code as well, see
> > getPublications()?
>
> We can fix at least getPublications() and getSubscriptions() as well.
> I'll make a patch for that.

Thanks.

> Or do you want to fix all of them,
> including non-logical-replication-related ones?
>

I feel it is better to be consistent across the entire code base
unless there is a reason for doing it differently. Does anyone else
have any thoughts on this matter?


-- 
With Regards,
Amit Kapila.

Re: pgsql: Raise a warning if there is a possibility of data from multiple

From

Tom Lane

Date:

08 September 2022, 13:58:00

Amit Kapila <amit.kapila16@gmail.com> writes:
> I feel it is better to be consistent across the entire code base
> unless there is a reason for doing it differently. Does anyone else
> have any thoughts on this matter?

That sounds like it would be a pretty massive and unnecessary patch.

            regards, tom lane

Re: pgsql: Raise a warning if there is a possibility of data from multiple

From

Amit Kapila

Date:

09 September 2022, 01:03:04

On Thu, Sep 8, 2022 at 7:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > I feel it is better to be consistent across the entire code base
> > unless there is a reason for doing it differently. Does anyone else
> > have any thoughts on this matter?
>
> That sounds like it would be a pretty massive and unnecessary patch.
>

Fair enough. Do you mind being consistent in this regard for logical
replication-related code? BTW, is there a reason we prefer to write in
one or another way (with or without appending schema_name)?

-- 
With Regards,
Amit Kapila.

Re: pgsql: Raise a warning if there is a possibility of data from multiple

From

Tom Lane

Date:

09 September 2022, 02:34:32

Amit Kapila <amit.kapila16@gmail.com> writes:
> Fair enough. Do you mind being consistent in this regard for logical
> replication-related code?

As long as not too many individual changes are involved, sure.
But consistency of this sort doesn't seem worth creating a lot
of back-patching land mines, IMO.

> BTW, is there a reason we prefer to write in
> one or another way (with or without appending schema_name)?

In places where we can trust the search_path to be just pg_catalog,
there's no real strong reason to write a schema name, I think.

            regards, tom lane