Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date
Msg-id CAD21AoArqU-_iULpzLNxF5sjY=rcfT7cN5T1x-REjZAm9T2BvA@mail.gmail.com
Whole thread Raw
In response to Re: POC: enable logical decoding when wal_level = 'replica' without a server restart  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Tue, Nov 25, 2025 at 3:24 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Sawada-Sa,
>
> Some review comments for the patch v29-0001.

Thank you for reviewing the patch!

>
> ======
> src/backend/access/transam/xact.c
>
> 1.
> +int effective_wal_level = WAL_LEVEL_REPLICA;
>
> Should that be declared as WalLevel instead of int?

I think no. I got the following compiler error:

error: initialization of ‘int *’ from incompatible pointer type
‘WalLevel *’ [-Wincompatible-pointer-types]

I noticed that effective_wal_level doesn't necessarily need to be
exposed since it's a dummy variable. I moved it to guc_tables.c.

>
> ======
> src/backend/replication/logical/decode.c
>
> xlog_decode:
>
> 2.
> +
> + /*
> + * Even if wal_level on the primary got decreased to 'replica', as
> + * long as there is at least one valid logical slot, the logical
> + * decoding remains enabled. So we don't check the logical
> + * decoding availability here but do so in
> + * XLOG_LOGICAL_DECODING_STATUS_CHANGE case. It covers the case
> + * where wal_level on the primary got decreased to 'minimal' too.
> + */
> + break;
>
> This comment seems to be written more like a comparison with what the
> comment used to say -- e.g. to me this reads like "We don't need to do
> the check we used to do here anymore because...". Maybe it can be
> worded slightly differently?
>
> ~~
>
> 3.
> + bool    *logical_decoding = (bool *) XLogRecGetData(buf->record);
>
> - /*
> - * If wal_level on the primary is reduced to less than
> - * logical, we want to prevent existing logical slots from
> - * being used.  Existing logical slots on the standby get
> - * invalidated when this WAL record is replayed; and further,
> - * slot creation fails when wal_level is not sufficient; but
> - * all these operations are not synchronized, so a logical
> - * slot may creep in while the wal_level is being reduced.
> - * Hence this extra check.
> - */
> - if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
> + if (!(*logical_decoding))
>
> Would it be simpler just to declare 'logical_decoding' as bool and
> deal with all the pointers during the assignment?
>
> e.g.
> BEFORE
> bool *logical_decoding = (bool *) XLogRecGetData(buf->record);
> if (!(*logical_decoding))
>
> AFTER
> bool logical_decoding = *((bool *) XLogRecGetData(buf->record));
> if (!logical_decoding)

Fixed the above points.

>
> ~~
>
> 4.
>   }
> +
>   break;
>
> Was the added blank line before 'break;' deliberate?

Yes.

>
> ======
> src/backend/replication/logical/logical.c
>
> CheckLogicalDecodingRequirements:
>
> 5.
> + /* CheckSlotRequirements() has already checked if wal_level >= 'replica' */
> +
> + /* Check if logical decoding is available on standby */
> + if (RecoveryInProgress() && !IsLogicalDecodingEnabled())
>
> Should both those single-line comments be combined into one larger comment?

I don't think so. The first comment isn't relevant with the following
'if' block.

>
> ======
> src/backend/replication/logical/logicalctl.c
>
> 6.
> + * In the future, we could extend support to include automatic transitions
> + * of effective_wal_level between 'minimal' and 'logical' WAL levels. However,
> + * this enhancement would require additional coordination mechanisms and
> + * careful implementation of operations such as terminating walsenders and
> + * archiver processes while carefully considering the sequence of operations
> + * to ensure system stability during these transitions.
>
> By convention, I think a comment that says "In the future" should also
> have an "XXX" marker.

Hmm, I've seen we use an "XXX" marker like "TODO" or "FIXME", but what
the comment explains is not actually a todo but possible future
enhancements.

>
> ~~~
>
> 7.
> +/*
> + * Writes XLOG_LOGICAL_DECODING_STATUS_CHANGE WAL record with the given status.
> + */
> +static void
> +write_logical_decoding_status_update_record(bool status)
>
> /Writes/Writes an/
>
> ~~~
>
> DisableLogicalDecodingIfNecessary:
>
> 8.
> + ereport(LOG,
> + errmsg("logical decoding is disabled because there is no valid
> logical replication slot"));
>
> SUGGESTION
> logical decoding is disabled because there are no valid logical
> replication slots
>
> ~~
>
> UpdateLogicalDecodingStatusEndOfRecovery:
>
> 9.
> + /*
> + * With 'minimal' WAL level, there have not been logical slots during
> + * recovery. Logical decoding is always disabled, and no need to
> + * synchronize XLogLogicalInfo.
> + */
>
> The wording of the comment didn't seem right.
>
> SUGGESTION
> With 'minimal' WAL level, there are no logical replication slots
> during recovery. Logical decoding is always disabled, so there is no
> need to synchronize XLogLogicalInfo.
>
> ======
> src/backend/replication/slot.c
>
> ReplicationSlotCleanup:
>
> 10.
> +
> + if (SlotIsLogical(s) && s->data.invalidated == RS_INVAL_NONE)
> + n_valid_logicalslots++;
> +
>
> Since you are not actually doing anything with this counter, other
> than later checking if it is 0, the counter could be replaced just by
> a boolean 'found_valid_logicalslot', so the code could be made
> slightly more performant if needed.
>
> ~~~
>
> ReplicationSlotsDropDBSlots:
>
> 11.
>
> + /*
> + * Count slots on other databases too so we can disable logical
> + * decoding only if no slots in the cluster.
> + */
> + SpinLockAcquire(&s->mutex);
> + if (s->data.invalidated == RS_INVAL_NONE)
> + n_valid_logicalslots++;
> + SpinLockRelease(&s->mutex);
>
> Same as the previous review comment #10. This counter could be
> replaced by a boooelan if you want.
>
> e.g.
> found_valid_logicalslot |= (s->data.invalidated == RS_INVAL_NONE);
>
> ~~~
>
> InvalidatePossiblyObsoleteSlot:
>
> 12.
>   /* Let caller know */
> - *invalidated = true;
> + invalidated = true;
>
> That comment is old from when 'invalidated' was a parameter by
> reference. It doesn't seem appropriate now.
>
> ~~~
>
> InvalidateObsoleteReplicationSlots:
>
> 13.
> + SpinLockAcquire(&s->mutex);
> + if (s->data.invalidated == RS_INVAL_NONE)
> + n_valid_logicalslots++;
> + SpinLockRelease(&s->mutex);
> +
>
> Yet another place (same as above review comments #10,#11) where the
> counter could be replaced by a simple boolean.
>
> found_valid_logicalslot |= (s->data.invalidated == RS_INVAL_NONE);
>
> ~~~
>
> 14.
> + /*
> + * Additionally, remember we have invalidated a logical slot too
> + * as we can request disabling logical decoding later.
> + */
>
> Don't say "Additionally" and "too" in the same sentence.

Fixed the above points.

I'll submit the latest patch soon.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Guillaume Lelarge
Date:
Subject: Re: oid2name : add objects file path
Next
From: Bernice Southey
Date:
Subject: Re: Second RewriteQuery complains about first RewriteQuery in edge case