Re: Disabled logical replication origin session causes primary key errors - Mailing list pgsql-bugs

From Masahiko Sawada
Subject Re: Disabled logical replication origin session causes primary key errors
Date
Msg-id CAD21AoDhBDJcWsmsS4Du8E6Vzk6krxu65sCbpGggBxLHj3-8eA@mail.gmail.com
Whole thread Raw
In response to RE: Disabled logical replication origin session causes primary key errors  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses RE: Disabled logical replication origin session causes primary key errors
List pgsql-bugs
On Tue, Apr 22, 2025 at 4:01 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Vignesh,
>
> > Couple of minor suggestions in the test:
> > 1) I felt this is not required for this test as it has been verified many times:
> > +
> > +# Insert tuples and confirms replication works well
> > +$node_publisher->safe_psql('postgres', "INSERT INTO t1 VALUES (1);");
> > +
> > +$node_publisher->wait_for_catchup('regress_sub');
>
> I added it to ensure remote_lsn to the valid value, but yes it is not mandatory.
>
> > 2) How about we change:
> > +# Confirms the origin can be advanced
> > +ok( $node_subscriber->poll_query_until(
> > +               'postgres',
> > +               "SELECT remote_lsn > '$remote_lsn' FROM
> > pg_catalog.pg_replication_origin_status os, pg_catalog.pg_subscription
> > s WHERE os.external_id = 'pg_' || s.oid AND s.subname =
> > 'regress_sub'",
> > +               't')
> > +         or die "Timed out while waiting for replication origin to be
> > updated");
> >
> > to:
> > $node_publisher->wait_for_catchup('regress_sub');
> >
> > # Confirms the origin can be advanced
> > $result = $node_subscriber->safe_psql('postgres',
> > "SELECT remote_lsn > '$remote_lsn' FROM
> > pg_catalog.pg_replication_origin_status os, pg_catalog.pg_subscription
> > s WHERE os.external_id = 'pg_' || s.oid AND s.subname = 'regress_sub'"
> > );
> > is($result, 't',
> > 'remote_lsn has advanced for apply worker raising an exception');
> >
> > In that way, we need not wait on poll_query_until.
>
> I intentionally used poll_query_until(), but I have no strong opinions.
> Modified.
>
> PSA new patches.

Thank you for updating the patch! Here are some comments:

+# The bug was that the replication origin wasn’t updated whe
+# apply_error_callback() was called with elevel >= ERROR, and the apply worker
+# continued running afterward.

I think it would be better to mention the fact that the problem
happened when an error was caught for instance by a plpgsql function.
How about rewriting it as follows?

# The bug was that when an ERROR was caught, for instance by a
PL/pgSQL function,
# the apply worker reset the replication origin but continued processing
# subsequent changes. This behavior resulted in a failure to update
the replication
# origin during further apply operations.

---
+# Define an after-trigger function for the table insert. It can be fired even
+# by the apply worker and always raises an exception. This situation allows
+# worker continue after apply_error_callback() is called with elevel = ERROR.
+$node_subscriber->safe_psql(
+        'postgres', q{
+CREATE FUNCTION handle_exception_trigger()
+RETURNS TRIGGER AS $$
+BEGIN
+        BEGIN
+                -- Raise an exception
+                RAISE EXCEPTION 'This is a test exception';
+        EXCEPTION
+                WHEN OTHERS THEN
+                        RETURN NEW;
+        END;
+
+        RETURN NEW;
+END;
+$$ LANGUAGE plpgsql;
+});
+
+$node_subscriber->safe_psql(
+        'postgres', q{
+CREATE TRIGGER silent_exception_trigger
+AFTER INSERT OR UPDATE ON t1
+FOR EACH ROW
+EXECUTE FUNCTION handle_exception_trigger();
+
+ALTER TABLE t1 ENABLE ALWAYS TRIGGER silent_exception_trigger;
+});

How about rewriting the comment as follows?

# Create an AFTER INSERT trigger on the table that raises and subsequently
# handles an exception. Subsequent insertions will trigger this exception,
# causing the apply worker to invoke its error callback with an ERROR. However,
# since the error is caught within the trigger, the apply worker will continue
# processing changes.

And can we execute these queries in one safe_psql() call?

---
+# Obtain current remote_lsn value to check its advancement later
+my $remote_lsn = $node_subscriber->safe_psql('postgres',
+        "SELECT remote_lsn FROM
pg_catalog.pg_replication_origin_status os, pg_catalog.pg_subscription
s WHERE os.external_id = 'pg_' || s.oid AND s.subname = 'regress_sub'"
+);

It seems to make sense to me to get the remote_lsn value just before
executing INSERT after creating the trigger.

Is it a conventional way to always use schema-qualified catalogs names
in regression tests? Looking at other tests in src/test/subscription,
there are only three cases:

% git grep pg_catalog src/test/subscription/t
src/test/subscription/t/001_rep_changes.pl:       FROM pg_catalog.pg_stat_io
src/test/subscription/t/020_messages.pl:    "SELECT COUNT(*) FROM
pg_catalog.pg_replication_slots WHERE slot_name = 'tap_sub' AND
active='f'",
src/test/subscription/t/029_on_error.pl:    "SELECT subenabled = false
FROM pg_catalog.pg_subscription WHERE subname = 'sub'"

ISTM that we don't necessarily need to make the catalog name schema-qualified.

---
We might want to stop both the publisher and the subscriber at the end
of the tests.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #18901: Bug in Perl-Script "src/backend/nodes/gen_node_support.pl"
Next
From: Alvaro Herrera
Date:
Subject: Re: Command order bug in pg_dump