Re: Disabled logical replication origin session causes primary key errors - Mailing list pgsql-bugs
| From | Masahiko Sawada |
|---|---|
| Subject | Re: Disabled logical replication origin session causes primary key errors |
| Date | |
| Msg-id | CAD21AoDhBDJcWsmsS4Du8E6Vzk6krxu65sCbpGggBxLHj3-8eA@mail.gmail.com Whole thread |
| In response to | RE: Disabled logical replication origin session causes primary key errors ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
| Responses |
RE: Disabled logical replication origin session causes primary key errors
|
| List | pgsql-bugs |
On Tue, Apr 22, 2025 at 4:01 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Vignesh,
>
> > Couple of minor suggestions in the test:
> > 1) I felt this is not required for this test as it has been verified many times:
> > +
> > +# Insert tuples and confirms replication works well
> > +$node_publisher->safe_psql('postgres', "INSERT INTO t1 VALUES (1);");
> > +
> > +$node_publisher->wait_for_catchup('regress_sub');
>
> I added it to ensure remote_lsn to the valid value, but yes it is not mandatory.
>
> > 2) How about we change:
> > +# Confirms the origin can be advanced
> > +ok( $node_subscriber->poll_query_until(
> > + 'postgres',
> > + "SELECT remote_lsn > '$remote_lsn' FROM
> > pg_catalog.pg_replication_origin_status os, pg_catalog.pg_subscription
> > s WHERE os.external_id = 'pg_' || s.oid AND s.subname =
> > 'regress_sub'",
> > + 't')
> > + or die "Timed out while waiting for replication origin to be
> > updated");
> >
> > to:
> > $node_publisher->wait_for_catchup('regress_sub');
> >
> > # Confirms the origin can be advanced
> > $result = $node_subscriber->safe_psql('postgres',
> > "SELECT remote_lsn > '$remote_lsn' FROM
> > pg_catalog.pg_replication_origin_status os, pg_catalog.pg_subscription
> > s WHERE os.external_id = 'pg_' || s.oid AND s.subname = 'regress_sub'"
> > );
> > is($result, 't',
> > 'remote_lsn has advanced for apply worker raising an exception');
> >
> > In that way, we need not wait on poll_query_until.
>
> I intentionally used poll_query_until(), but I have no strong opinions.
> Modified.
>
> PSA new patches.
Thank you for updating the patch! Here are some comments:
+# The bug was that the replication origin wasn’t updated whe
+# apply_error_callback() was called with elevel >= ERROR, and the apply worker
+# continued running afterward.
I think it would be better to mention the fact that the problem
happened when an error was caught for instance by a plpgsql function.
How about rewriting it as follows?
# The bug was that when an ERROR was caught, for instance by a
PL/pgSQL function,
# the apply worker reset the replication origin but continued processing
# subsequent changes. This behavior resulted in a failure to update
the replication
# origin during further apply operations.
---
+# Define an after-trigger function for the table insert. It can be fired even
+# by the apply worker and always raises an exception. This situation allows
+# worker continue after apply_error_callback() is called with elevel = ERROR.
+$node_subscriber->safe_psql(
+ 'postgres', q{
+CREATE FUNCTION handle_exception_trigger()
+RETURNS TRIGGER AS $$
+BEGIN
+ BEGIN
+ -- Raise an exception
+ RAISE EXCEPTION 'This is a test exception';
+ EXCEPTION
+ WHEN OTHERS THEN
+ RETURN NEW;
+ END;
+
+ RETURN NEW;
+END;
+$$ LANGUAGE plpgsql;
+});
+
+$node_subscriber->safe_psql(
+ 'postgres', q{
+CREATE TRIGGER silent_exception_trigger
+AFTER INSERT OR UPDATE ON t1
+FOR EACH ROW
+EXECUTE FUNCTION handle_exception_trigger();
+
+ALTER TABLE t1 ENABLE ALWAYS TRIGGER silent_exception_trigger;
+});
How about rewriting the comment as follows?
# Create an AFTER INSERT trigger on the table that raises and subsequently
# handles an exception. Subsequent insertions will trigger this exception,
# causing the apply worker to invoke its error callback with an ERROR. However,
# since the error is caught within the trigger, the apply worker will continue
# processing changes.
And can we execute these queries in one safe_psql() call?
---
+# Obtain current remote_lsn value to check its advancement later
+my $remote_lsn = $node_subscriber->safe_psql('postgres',
+ "SELECT remote_lsn FROM
pg_catalog.pg_replication_origin_status os, pg_catalog.pg_subscription
s WHERE os.external_id = 'pg_' || s.oid AND s.subname = 'regress_sub'"
+);
It seems to make sense to me to get the remote_lsn value just before
executing INSERT after creating the trigger.
Is it a conventional way to always use schema-qualified catalogs names
in regression tests? Looking at other tests in src/test/subscription,
there are only three cases:
% git grep pg_catalog src/test/subscription/t
src/test/subscription/t/001_rep_changes.pl: FROM pg_catalog.pg_stat_io
src/test/subscription/t/020_messages.pl: "SELECT COUNT(*) FROM
pg_catalog.pg_replication_slots WHERE slot_name = 'tap_sub' AND
active='f'",
src/test/subscription/t/029_on_error.pl: "SELECT subenabled = false
FROM pg_catalog.pg_subscription WHERE subname = 'sub'"
ISTM that we don't necessarily need to make the catalog name schema-qualified.
---
We might want to stop both the publisher and the subscriber at the end
of the tests.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
pgsql-bugs by date: