Re: Failure of subscription tests with topminnow - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Failure of subscription tests with topminnow
Date
Msg-id CAA4eK1LT-zf5cU5paubxo1=Us2WiWdTMjFU4utLjNuTwgjh65w@mail.gmail.com
Whole thread Raw
In response to Failure of subscription tests with topminnow  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Failure of subscription tests with topminnow
List pgsql-hackers
On Mon, Aug 16, 2021 at 9:24 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> Hi all,
>
> topminnow has just failed in a weird fashion:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=topminnow&dt=2021-08-15%2013%3A24%3A58
> # SELECT pid !=  FROM pg_stat_replication WHERE application_name = 'tap_sub';
> # expecting this output:
> # t
> # last actual query output:
> #
> # with stderr:
> # ERROR:  syntax error at or near "FROM"
> # LINE 1: SELECT pid !=  FROM pg_stat_replication WHERE application_na...
>
> Looking at the logs, it seems like the problem boils down to an active
> slot when using ALTER SUBSCRIPTION tap_sub CONNECTION:
> 2021-08-15 18:44:38.027 CEST [16473:2] ERROR:  could not start WAL
> streaming: ERROR:  replication slot "tap_sub" is active for PID 16336
>

The "ALTER SUBSCRIPTION tap_sub CONNECTION" would lead to restart the
walsender process. Now, here the problem seems to be that the previous
walsender process (16336) didn't exit and the new process started to
use the slot which leads to the error shown in the log. This is
evident from the below part of log where we can see that 16336 has
exited after new process started to use the slot.

2021-08-15 18:44:38.027 CEST [16475:6] tap_sub LOG:  received
replication command: START_REPLICATION SLOT "tap_sub" LOGICAL
0/16BEEB0 (proto_version '1', publication_names
'"tap_pub","tap_pub_ins_only"')
2021-08-15 18:44:38.027 CEST [16475:7] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1',
publication_names '"tap_pub","tap_pub_ins_only"')
2021-08-15 18:44:38.027 CEST [16475:8] tap_sub ERROR:  replication
slot "tap_sub" is active for PID 16336
2021-08-15 18:44:38.027 CEST [16475:9] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1',
publication_names '"tap_pub","tap_pub_ins_only"')
2021-08-15 18:44:38.041 CEST [16475:10] tap_sub LOG:  disconnection:
session time: 0:00:00.036 user=nm database=postgres host=[local]
2021-08-15 18:44:38.043 CEST [16336:14] tap_sub LOG:  disconnection:
session time: 0:00:06.367 user=nm database=postgres host=[local]

One idea to solve this is to first disable the subscription, wait for
the walsender process to exit, and then change the connection string
and re-enable the subscription.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Greg Nancarrow
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: Wenjing
Date:
Subject: Is it worth pushing conditions to sublink/subplan?