Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers

From shveta malik
Subject Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date
Msg-id CAJpy0uDA_5untUaMiSoWEfpFEzhwNMoepAp8+3u=-yDq2wnkwg@mail.gmail.com
Whole thread Raw
In response to Re: POC: enable logical decoding when wal_level = 'replica' without a server restart  (shveta malik <shveta.malik@gmail.com>)
List pgsql-hackers
On Fri, Oct 10, 2025 at 9:32 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Oct 9, 2025 at 4:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Thank you for the comments.
> >
> > I've attached the updated patch. As we discussed, it now uses the lazy
> > deactivation in all cases and has the discussed points in the comments
> > atop logicalctl.c.
> >
>
> Thank You for making the changes. I am in the process of verifying the
> patch. Please find one concern.
>
> When I promote a standby, there's a brief period during
> recovery-completion when status_change_allowed is set to true and also
> RecoveryInProgress() is still true. If I try to create a logical
> replication slot on the standby during this window, it fails with the
> following error:
>
> postgres=# SELECT pg_create_logical_replication_slot('slot1',
> 'pgoutput', false, false, false);
> ERROR:  logical decoding on standby requires "effective_wal_level" >=
> "logical" on the primary
> HINT:  To enable logical decoding, set "wal_level" >= "logical" or
> create at least one logical slot when "wal_level" = "replica".
>
> Note that the primary already has logical slots. This issue occurs
> because logical decoding is disabled by
> UpdateLogicalDecodingStatusEndOfRecovery() due to no existing slots on
> standby while RecoveryInProgress() is still true. Should
> CheckLogicalDecodingRequirements() also consider
> 'status_change_allowed' on standby to handle this transition more
> gracefully?
>

Please find a few more comments:

1)
I am creating a subscription for publication present on standby, I get
this error:

postgres=# CREATE subscription sub1 connection 'dbname=postgres
host=localhost user=shveta port=5434' publication pub1;
ERROR:  could not create replication slot "sub1": ERROR:  logical
decoding on standby requires "effective_wal_level" >= "logical" on the
primary
HINT:  To enable logical decoding, set "wal_level" >= "logical" or
create at least one logical slot when "wal_level" = "replica".

Should Hint also mention 'on primary' i.e. 'To enable logical decoding
on primary, set..' , otherwise it could be slightly confusing to
follow HINT.

2)
CheckLogicalDecodingRequirements():

Is there a reason for moving 'logical decoding on standby requires..'
error before 'logical decoding requires a database connection' error?
Shall we keep the same order as earlier?

3)
GetActiveWalLevelOnStandby() is not being used anywhere now.

4)
+ if (!old_pending_disable)
+ {
+ volatile PROC_HDR *procglobal = ProcGlobal;
+ ProcNumber checkpointerProc = procglobal->checkpointerProc;
+
+ /* Wake up the checkpointer */
+ if (checkpointerProc != INVALID_PROC_NUMBER)
+ SetLatch(&GetPGProcByNumber(checkpointerProc)->procLatch);
+ }

Why are we waking the checkpointer only when pending_disable changes
from false to true? When this function is called and
old_pending_disable is still true, wouldn’t that be an even stronger
reason to wake the checkpointer?

5)
+ /*
+ * Update shmem flags. We don't need to care about the order of setting
+ * global flag and writing the WAL record writes are not allowed yet.
+ */

Comment is somewhat problematic.

6)

ReportSlotInvalidation():
+ appendStringInfoString(&err_detail, _("Logical decoding on standby
requires \"wal_level\" >= \"logical\" or at least one logical slot on
the primary server."));

This message could be a little ambiguous. It is not completely clear
that that wal_level=logical is needed on standby or on primary. How
about below msg:

Logical decoding on standby requires the primary server to have either
wal_level >= logical or at least one logical replication slot created.

7)
+ "effective_wal_level got decreased to 'replica' on the primary to
invalidate stnadby's slots"

stnadby --> standby

8)
+ # Trigger promotion with no wait for the startup process to reach the
+ # injection point.
+ $standby5->safe_psql('postgres', qq[select pg_promote(false)]);
+ note('promote the standby and waiting for injection_point');
+ $standby5->wait_for_event('startup',
+ 'startup-logical-decoding-status-change-end-of-recovery');
+ note(
+ "injection_point
'startup-logical-decoding-status-change-end-of-recovery' is reached"
+ );

Is the comment correct here? We are waiting to reach the injection
point? But the comment says 'no wait for'?

9)
+ # Drop the logical slot, requesting to disable logical decoding to
the checkpointer.
+ # It  has to wait for the recovery to complete before disabling
logical decoding.

Double space in 'It  has'.

10)
+ # Drop the logical slot, requesting to disable logical decoding to
the checkpointer.
+ # It  has to wait for the recovery to complete before disabling
logical decoding.
+ $standby5->safe_psql('postgres',
+ qq[select pg_drop_replication_slot('standby5_slot');]);
+
+ # Resume the startup process to complete the recovery.
+ $standby5->safe_psql('postgres',
+ qq[select injection_points_wakeup('startup-logical-decoding-status-change-end-of-recovery')]
+ );

I think between these 2 steps, we should check logs for 'waiting for
recovery completion to change logical decoding status'.
The test will become more clear then.


thanks
Shveta



pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: URLs in rbtree.c are broken
Next
From: "Aya Iwata (Fujitsu)"
Date:
Subject: RE: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE