Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date | |
Msg-id | CAExHW5s6J-=pxUesRHOrFj6sN-1TjZ1uXMHx7+c2T-d-wpN4gQ@mail.gmail.com Whole thread Raw |
In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Fri, Sep 12, 2025 at 9:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > One thing related to this which needs a discussion is after this > change, it is possible that part of the transaction contains > additional logical_wal_info. I couldn't think of a problem due to this > but users using pg_waldump or other WAL reading utilities could > question this. One possibility is that we always start including > logical_wal_info for the next new transaction but not sure if that is > required. It would be good if other people involved in the discussion > or otherwise could share their opinion on this point. > AFAIR, logical info is a separate section in a WAL record, and there is not marker which says "WAL will contain logical info henceforth". So the utilities should be checking for the existence of such info before reading it. So I guess it should be ok. Some extra sensitive utilities may expect that once a WAL record has logical info, all the succeeding WAL records will have it. They may find it troublesome that WAL records with and without logical info are interleaved. Generally, I would prefer that presence/absence of logical info changes at transaction boundaries, but we will still have interleaving WAL records. So I doubt how much that matters. Sorry for jumping late in the discussion. I have a few comments, mostly superficial ones. I am yet to take a deeper look at the synchronization logic. <sect2 id="logicaldecoding-replication-slots"> @@ -328,8 +362,7 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU that could be needed by the logical decoding on the standby (as it does not know about the <literal>catalog_xmin</literal> on the standby). Existing logical slots on standby also get invalidated if - <varname>wal_level</varname> on the primary is reduced to less than - <literal>logical</literal>. + logical decoding becomes disabled on the primary. s/becomes disabled/is disabled/ or /gets disabled/. Given that logical decoding can be disabled in multiple ways, it's better to add a reference here to a section which explains what disabling logical decoding means. <listitem> <para> <literal>wal_level_insufficient</literal> means that the - primary doesn't have a <xref linkend="guc-wal-level"/> sufficient to - perform logical decoding. It is set only for logical slots. + logical decoding is disabled on primary due to insufficient + <xref linkend="guc-wal-level"/> or no logical slots. It is set only + for logical slots. It may not be apparent to the users that insufficient wal_level means 'minimal' here. It will be better if we just mention logical decoding is disabled on primary and refer to a section which explains what disabling logical decoding means. * * Skip this if we're taking a full-page image of the new page, as we * don't include the new tuple in the WAL record in that case. Also - * disable if wal_level='logical', as logical decoding needs to be able to - * read the new tuple in whole from the WAL record alone. + * disable if logical decoding is enabled, as logical decoding needs to be + * able to read the new tuple in whole from the WAL record alone. */ Not fault of this patch, but I find the comment to be slighly out of sync with the code. The code actually checks whether the logical decoding is enabled and whether the relation requires WAL to be logged for logical decoding. The difference is subtle, but it did cause me a bit of confusion when I read the code. Please consider rephrasing the comment while you are modifying it. if (oldbuf == newbuf && !need_tuple_data && !XLogCheckBufferNeedsBackup(newbuf)) @@ -9057,8 +9057,8 @@ log_heap_update(Relation reln, Buffer oldbuf, /* * Perform XLogInsert of an XLOG_HEAP2_NEW_CID record * - * This is only used in wal_level >= WAL_LEVEL_LOGICAL, and only for catalog - * tuples. + * This is only used when effective WAL level is logical, and only for Given that the earlier comment used GUC name, it's better to use the GUC name "effective_wal_level" here. case XLOG_PARAMETER_CHANGE: + + /* + * Even if wal_level on the primary got decreased to 'replica' it + * doesn't necessarily mean to disable the logical decoding as + * long as we have at least one logical slot. So we don't check + * the logical decoding availability here but do in + * XLOG_LOGICAL_DECODING_STATUS_CHANGE case. + */ The earlier code checked for wal_level < WAL_LEVEL_LOGICAL, which includes the case when wal_level is 'minimal'. I don't see the new code handling 'minimal' case here. Am I missing something? Do we need a comment here which specifically mentions "minimal" case grammar "Even if wal_level on the primary was lowered to 'replica', as long as there is at least one logical slot, the logical decoding remains enabled. ... " also ... do so in ... . + * The module maintains separate controls of two aspects: writing information + * required by logical decoding to WAL records and utilizing logical decoding + * itself, controlled by LogicalDecodingCtl->xlog_logical_info and + * ->logical_decoding_enabled fields respectively. The activation process of + * logical decoding involves several steps, beginning with maintaining logical + * decoding in a disabled state while incrementing the effective WAL level to + * its 'logical' equivalent. This change is reflected in the read-only + * effective_wal_level GUC parameter. The process includes necessary + * synchronization to ensure all processes adapt to the new effective WAL + * level before logical decoding is fully enabled. Deactivation follows a + * similarly careful, multi-step process in the reverse order. + * I was expecting that the step-by-step process would be described in a README. But given that we have this detailed comment here, it may be good to have the step-by-step process described here itself. I see some comments use "effective_wal_level is logical" and some mention "logical decoding is enabled" depending upon the context. I think, it's important to differentiate between these two given that the it's possible to find the system in a state where effective_wal_level is 'logical' but logical decoding is disabled. But as a first reader, this did cause me some confusion. Above paragraph is a good place to make that distinction clear. The description of step-by-step process will make things clearer. + /* cannot change while ReplicationSlotCtlLock is held */ + if (!s->in_use) + continue; + + /* NB: intentionally counting invalidated slots */ Explain the reason in the comment. + +# Cleanup all existing slots and start the concurrency test. +$primary->safe_psql('postgres', + qq[select pg_drop_replication_slot('test_slot')]); We should add effective wal level test. At a later point we have a test for this but it would be good to make sure that the effective wal level is replica at this stage in the test as well. test_wal_level($primary, "replica|replica", "effective_wal_level reset to 'replica' after dropping all logical slots"); + +# Wait for the logical slot 'test_slot' has been created. Should be "Wait for the logical slot 'test_slot' to be created" or "Check that the logical slot 'test_slot' has been created". +# Check if the standby's effective_wal_level should be 'logical' in spite +# of wal_level being 'replica'. +test_wal_level($standby1, "replica|logical", + "effective_wal_level='logical' on standby"); Do we have a test to verify that a logical replication slot can not be created on a standby whose primary does *not* have effective_wal_level 'logical'? + +# Create a logical slot on the standby, which should be succeeded grammar: ..., which should succeed OR better "Creating a logical slot on standby should succeed". +# as the primary enables it. as the primary has logical decoding enabled. +# Check if the logical decoding is not enabled on the standby4. +test_wal_level($standby4, "logical|replica", + "standby's effective_wal_level got decreased to 'replica'"); +$standby4->safe_psql('postgres', + qq[select pg_drop_replication_slot('standby4_slot')]); + Instead of dropping the slot, if we create another slot on the primary, what happens to the invalidated replication slot? Does it remain invalidated? Have we covered this scenario in tests? + +done_testing(); What happens if we create a logical slot when wal_level is 'minimal'? Do we have a test for that? -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: