Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date | |
Msg-id | CAD21AoATKbc=tLKBKQ46hKYWXW7+CvW9U3EYMjabVh=uNrr18Q@mail.gmail.com Whole thread Raw |
In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Mon, Sep 8, 2025 at 11:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Sep 8, 2025 at 11:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Sep 5, 2025 at 9:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Sep 6, 2025 at 3:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > On Tue, Sep 2, 2025 at 5:12 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > > > > > > > > > > > > > > I tested the behaviour with HEAD and with Patch. And I confirmed the > > > > > change in behaviour between HEAD and Patch > > > > > > > > > > Suppose we have a primary and a standby with wal_level = logical and > > > > > guc parameters to enable slot sync worker are set accordingly. A slot > > > > > sync worker will be running. > > > > > Now we change the value of wal_level for primary to replica. And > > > > > restart the primary server > > > > > > > > > > With HEAD, during restart the existing sync_slot_worker will exit with: > > > > > 2025-09-02 11:49:08.846 IST [3877882] ERROR: synchronization worker > > > > > "" could not connect to the primary server: connection to server at > > > > > "localhost" (127.0.0.1), port 5432 failed: Connection refused > > > > > Is the server running on that host and accepting TCP/IP connections? > > > > > 2025-09-02 11:49:11.380 IST [3877885] FATAL: streaming replication > > > > > receiver "walreceiver" could not connect to the primary server: > > > > > connection to server at "localhost" (127.0.0.1), port 5432 failed: > > > > > Connection refused > > > > > Is the server running on that host and accepting TCP/IP connections? > > > > > > > > > > and after the restart of the primary server, slot sync worker will > > > > > restart and it is able to connect to the primary. > > > > > > > > > > With Patch, during restart the existing sync_slot_worker will exit. > > > > > But after the restart of the primary server, slot sync worker cannot > > > > > start and we can see following log: > > > > > 2025-09-02 12:44:51.497 IST [3947520] LOG: replication slot > > > > > synchronization worker is shutting down on receiving SIGINT > > > > > 2025-09-02 12:44:51.498 IST [3943504] LOG: replication slot > > > > > synchronization requires logical decoding to be enabled > > > > > 2025-09-02 12:44:51.498 IST [3943504] HINT: To enable logical > > > > > decoding on primary, set "wal_level" >= "logical" or create at least > > > > > one logical slot when "wal_level" = "replica". > > > > > 2025-09-02 12:45:51.537 IST [3943504] LOG: replication slot > > > > > synchronization requires logical decoding to be enabled > > > > > 2025-09-02 12:45:51.537 IST [3943504] HINT: To enable logical > > > > > decoding on primary, set "wal_level" >= "logical" or create at least > > > > > one logical slot when "wal_level" = "replica". > > > > > > > > > > So, with HEAD, after we restart the primary server with 'wal_level = > > > > > replica', the slot sync worker can restart and connect to the primary > > > > > but with patch it cannot start after restart due to the check in > > > > > ValidateSlotSyncParams. > > > > > > > > But the slotsync worker is launched again once logical decoding is > > > > enabled, no? I'm not sure that we want to launch the slotsync worker > > > > also when we know logical decoding is not enabled. > > > > > > > > > > Why in the first place the logical_decoding enabled check has failed > > > because IIUC, the wal_level on standby is still 'logical'? > > > > This is because logical decoding on standbys can be used only when the > > standby's effective_wal_level is 'logical', which also means the > > primary's effective_wal_level is 'logical' too. This behavior is > > mostly the same as today; logical decoding on standbys can be used > > only when both the primary and the standbys set wal_level to > > 'logical'. Even if standby's wal_level is set to logical, it doesn't > > mean that incoming WAL records are generated on the primary with the > > information required by logical decoding. > > > > This is true but IIUC Shlok's report says that we are able to restart > server before patch and not after patch. Am, I missing something? If > not, then shouldn't this be fixed separately first? I've reread his report. IIUC what happened in his test scenario was; while he was restarting the primary server (to make wal_level='replica' effect), the slotsync worker exited due to a connection error. Then after the primary started up, with the patch, the slotsync worker was not launched again, whereas it was launched again without the patch. This is because with the patch, the standby disables the logical decoding when replaying the STATUS_CHANGE record. If the primary enables logical decoding again, the STATUS_CHANGE record with logical_decoding=true is replicated to the standby and it launches the slotsync worker again. That is, the slotsync worker launches based on the standby's effective_wal_level. On the other hand, before the patch, the slotsync worker is launched solely based on the standby's wal_level. Therefore, it launches but doesn't do anything in this case (as the primary should not have any logical slot). I thought it makes sense that we don't launch the slotsync worker when effective_wal_level is 'replica', but is your suggestion that the slotsync worker needs to be launched only when the standby's wal_level is logical regardless of effective_wal_level? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: