On Thu, Oct 09, 2025 at 03:37:25PM +0300, Грем Снорт wrote:
> I've found a simple problem in one of subscription tests
> (`src/test/subscription/t/009_matviews.pl`).
(Added a couple of folks in CC.)
Hmm, something else is going on here, and I am not sure what yet (a
bisect is annoying as the test depends on a timeout for failure
detection, see below for more ranting).
The backend change coupled with this test comes from bc1adc651b8e,
first introduced in v11. At the top of REL_11_STABLE, which is the
first branch where the test has been introduced, if I update
pgoutput.c and remove the is_publishable_relation() call in
pgoutput_change() to undo the fix, then the test is able to hang as it
is designed.
Now, if I do the same thing on HEAD, removing the check, then the test
passes! Something else is going on here: the test is not checking
what it has been written for. Applying your patch does not change
this state.
As far as I can see, the test is broken since v17. Up to v16, the
test would hang once the fix in pgoutput.c is reverted. In v17 and
newer versions, it does not.
While something specific to v17 is to blame here, I am also going to
complain about the way this test is writen and designed to fail: a
failing scenario should be deterministic, and should check some state
in the cluster to validate something, be it a lookup at some relation,
some catalogs or some server logs. 009_matviews.pl does nothing like
that: a failure is a test hanging with the failure detected by a
timeout. From my perspective, this is a poor design choice, and one
reason why nobody has noticed the regression I'm just finding in v17
after looking more closely as an effect of your patch.
Amit, Kurada-san or Sawada-san, does something ring a bell? There
have been many changes in the logical replication code since v17, and
it sounds like an issue introduced by one of these recent changes, but
I have to admit that I am not seeing anything obvious (that's not
dcd4454590e7, checked it).
Up to v16, the test loops with the following failure popping in the
subscriber logs:
2025-10-10 11:24:15.884 JST [25148] ERROR: logical replication target
relation "public.testmv1" does not exist
2025-10-10 11:24:15.884 JST [25148] CONTEXT: processing remote data
for replication origin "pg_16391" during message type "INSERT" in
transaction 733, finished at 0/14BBE08
From v17, the subscriber logs just accepts things, without the worker
complaining about a matview:
2025-10-10 11:27:10.020 JST [32467] LOG: logical replication table
synchronization worker for subscription "mysub", table "test1" has
started
2025-10-10 11:27:10.041 JST [32467] LOG: logical replication table
synchronization worker for subscription "mysub", table "test1" has
finished
2025-10-10 11:27:10.120 JST [32443] LOG: received fast shutdown request
I am attempting a bisect, as well, perhaps I'll be able to catch
something...
--
Michael