Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date | |
Msg-id | CAJpy0uDA_5untUaMiSoWEfpFEzhwNMoepAp8+3u=-yDq2wnkwg@mail.gmail.com Whole thread Raw |
In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (shveta malik <shveta.malik@gmail.com>) |
List | pgsql-hackers |
On Fri, Oct 10, 2025 at 9:32 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Thu, Oct 9, 2025 at 4:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Thank you for the comments. > > > > I've attached the updated patch. As we discussed, it now uses the lazy > > deactivation in all cases and has the discussed points in the comments > > atop logicalctl.c. > > > > Thank You for making the changes. I am in the process of verifying the > patch. Please find one concern. > > When I promote a standby, there's a brief period during > recovery-completion when status_change_allowed is set to true and also > RecoveryInProgress() is still true. If I try to create a logical > replication slot on the standby during this window, it fails with the > following error: > > postgres=# SELECT pg_create_logical_replication_slot('slot1', > 'pgoutput', false, false, false); > ERROR: logical decoding on standby requires "effective_wal_level" >= > "logical" on the primary > HINT: To enable logical decoding, set "wal_level" >= "logical" or > create at least one logical slot when "wal_level" = "replica". > > Note that the primary already has logical slots. This issue occurs > because logical decoding is disabled by > UpdateLogicalDecodingStatusEndOfRecovery() due to no existing slots on > standby while RecoveryInProgress() is still true. Should > CheckLogicalDecodingRequirements() also consider > 'status_change_allowed' on standby to handle this transition more > gracefully? > Please find a few more comments: 1) I am creating a subscription for publication present on standby, I get this error: postgres=# CREATE subscription sub1 connection 'dbname=postgres host=localhost user=shveta port=5434' publication pub1; ERROR: could not create replication slot "sub1": ERROR: logical decoding on standby requires "effective_wal_level" >= "logical" on the primary HINT: To enable logical decoding, set "wal_level" >= "logical" or create at least one logical slot when "wal_level" = "replica". Should Hint also mention 'on primary' i.e. 'To enable logical decoding on primary, set..' , otherwise it could be slightly confusing to follow HINT. 2) CheckLogicalDecodingRequirements(): Is there a reason for moving 'logical decoding on standby requires..' error before 'logical decoding requires a database connection' error? Shall we keep the same order as earlier? 3) GetActiveWalLevelOnStandby() is not being used anywhere now. 4) + if (!old_pending_disable) + { + volatile PROC_HDR *procglobal = ProcGlobal; + ProcNumber checkpointerProc = procglobal->checkpointerProc; + + /* Wake up the checkpointer */ + if (checkpointerProc != INVALID_PROC_NUMBER) + SetLatch(&GetPGProcByNumber(checkpointerProc)->procLatch); + } Why are we waking the checkpointer only when pending_disable changes from false to true? When this function is called and old_pending_disable is still true, wouldn’t that be an even stronger reason to wake the checkpointer? 5) + /* + * Update shmem flags. We don't need to care about the order of setting + * global flag and writing the WAL record writes are not allowed yet. + */ Comment is somewhat problematic. 6) ReportSlotInvalidation(): + appendStringInfoString(&err_detail, _("Logical decoding on standby requires \"wal_level\" >= \"logical\" or at least one logical slot on the primary server.")); This message could be a little ambiguous. It is not completely clear that that wal_level=logical is needed on standby or on primary. How about below msg: Logical decoding on standby requires the primary server to have either wal_level >= logical or at least one logical replication slot created. 7) + "effective_wal_level got decreased to 'replica' on the primary to invalidate stnadby's slots" stnadby --> standby 8) + # Trigger promotion with no wait for the startup process to reach the + # injection point. + $standby5->safe_psql('postgres', qq[select pg_promote(false)]); + note('promote the standby and waiting for injection_point'); + $standby5->wait_for_event('startup', + 'startup-logical-decoding-status-change-end-of-recovery'); + note( + "injection_point 'startup-logical-decoding-status-change-end-of-recovery' is reached" + ); Is the comment correct here? We are waiting to reach the injection point? But the comment says 'no wait for'? 9) + # Drop the logical slot, requesting to disable logical decoding to the checkpointer. + # It has to wait for the recovery to complete before disabling logical decoding. Double space in 'It has'. 10) + # Drop the logical slot, requesting to disable logical decoding to the checkpointer. + # It has to wait for the recovery to complete before disabling logical decoding. + $standby5->safe_psql('postgres', + qq[select pg_drop_replication_slot('standby5_slot');]); + + # Resume the startup process to complete the recovery. + $standby5->safe_psql('postgres', + qq[select injection_points_wakeup('startup-logical-decoding-status-change-end-of-recovery')] + ); I think between these 2 steps, we should check logs for 'waiting for recovery completion to change logical decoding status'. The test will become more clear then. thanks Shveta
pgsql-hackers by date: