Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date | |
Msg-id | CAD21AoBEBAoH3CNkEDx3sQEXA3GM3h=p8gF8UK-VHWd70DgGGg@mail.gmail.com Whole thread Raw |
In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
List | pgsql-hackers |
On Fri, Sep 12, 2025 at 1:56 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Fri, Sep 12, 2025 at 9:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > One thing related to this which needs a discussion is after this > > change, it is possible that part of the transaction contains > > additional logical_wal_info. I couldn't think of a problem due to this > > but users using pg_waldump or other WAL reading utilities could > > question this. One possibility is that we always start including > > logical_wal_info for the next new transaction but not sure if that is > > required. It would be good if other people involved in the discussion > > or otherwise could share their opinion on this point. > > > > AFAIR, logical info is a separate section in a WAL record, and there > is not marker which says "WAL will contain logical info henceforth". > So the utilities should be checking for the existence of such info > before reading it. So I guess it should be ok. Some extra sensitive > utilities may expect that once a WAL record has logical info, all the > succeeding WAL records will have it. They may find it troublesome that > WAL records with and without logical info are interleaved. Generally, > I would prefer that presence/absence of logical info changes at > transaction boundaries, but we will still have interleaving WAL > records. So I doubt how much that matters. > > Sorry for jumping late in the discussion. I have a few comments, > mostly superficial ones. I am yet to take a deeper look at the > synchronization logic. Thank you for reviewing the patch! > > <sect2 id="logicaldecoding-replication-slots"> > @@ -328,8 +362,7 @@ postgres=# select * from > pg_logical_slot_get_changes('regression_slot', NULL, NU > that could be needed by the logical decoding on the standby (as it does > not know about the <literal>catalog_xmin</literal> on the standby). > Existing logical slots on standby also get invalidated if > - <varname>wal_level</varname> on the primary is reduced to less than > - <literal>logical</literal>. > + logical decoding becomes disabled on the primary. > > s/becomes disabled/is disabled/ or /gets disabled/. Given that logical > decoding can be disabled in multiple ways, it's better to add a > reference here to a section which explains what disabling logical > decoding means. > > <listitem> > <para> > <literal>wal_level_insufficient</literal> means that the > - primary doesn't have a <xref linkend="guc-wal-level"/> sufficient to > - perform logical decoding. It is set only for logical slots. > + logical decoding is disabled on primary due to insufficient > + <xref linkend="guc-wal-level"/> or no logical slots. It is set only > + for logical slots. > > It may not be apparent to the users that insufficient wal_level means > 'minimal' here. It will be better if we just mention logical decoding > is disabled on primary and refer to a section which explains what > disabling logical decoding means. Agreed with the above points. > > * > * Skip this if we're taking a full-page image of the new page, as we > * don't include the new tuple in the WAL record in that case. Also > - * disable if wal_level='logical', as logical decoding needs to be able to > - * read the new tuple in whole from the WAL record alone. > + * disable if logical decoding is enabled, as logical decoding needs to be > + * able to read the new tuple in whole from the WAL record alone. > */ > > Not fault of this patch, but I find the comment to be slighly out of > sync with the code. The code actually checks whether the logical > decoding is enabled and whether the relation requires WAL to be logged > for logical decoding. The difference is subtle, but it did cause me a > bit of confusion when I read the code. Please consider rephrasing the > comment while you are modifying it. Okay, I'll try to rephrasing it. > if (oldbuf == newbuf && !need_tuple_data && > !XLogCheckBufferNeedsBackup(newbuf)) > @@ -9057,8 +9057,8 @@ log_heap_update(Relation reln, Buffer oldbuf, > /* > * Perform XLogInsert of an XLOG_HEAP2_NEW_CID record > * > - * This is only used in wal_level >= WAL_LEVEL_LOGICAL, and only for catalog > - * tuples. > + * This is only used when effective WAL level is logical, and only for > > Given that the earlier comment used GUC name, it's better to use the > GUC name "effective_wal_level" here. Will fix. > > > case XLOG_PARAMETER_CHANGE: > + > + /* > + * Even if wal_level on the primary got decreased to 'replica' it > + * doesn't necessarily mean to disable the logical decoding as > + * long as we have at least one logical slot. So we don't check > + * the logical decoding availability here but do in > + * XLOG_LOGICAL_DECODING_STATUS_CHANGE case. > + */ > > The earlier code checked for wal_level < WAL_LEVEL_LOGICAL, which > includes the case when wal_level is 'minimal'. I don't see the new > code handling 'minimal' case here. Am I missing something? Do we need > a comment here which specifically mentions "minimal" case Good point. I think it's not a problem as long as we write STATUS_CHANGE record with logical_decoding=false before PARAMETER_CHANGE record, but it would be more robust to handle the case where wal_level gets decreased to 'minimal' or 'replica' there. > grammar "Even if wal_level on the primary was lowered to 'replica', as > long as there is at least one logical slot, the logical decoding > remains enabled. ... " also ... do so in ... . Will fix. > > + * The module maintains separate controls of two aspects: writing information > + * required by logical decoding to WAL records and utilizing logical decoding > + * itself, controlled by LogicalDecodingCtl->xlog_logical_info and > + * ->logical_decoding_enabled fields respectively. The activation process of > + * logical decoding involves several steps, beginning with maintaining logical > + * decoding in a disabled state while incrementing the effective WAL level to > + * its 'logical' equivalent. This change is reflected in the read-only > + * effective_wal_level GUC parameter. The process includes necessary > + * synchronization to ensure all processes adapt to the new effective WAL > + * level before logical decoding is fully enabled. Deactivation follows a > + * similarly careful, multi-step process in the reverse order. > + * > > I was expecting that the step-by-step process would be described in a > README. But given that we have this detailed comment here, it may be > good to have the step-by-step process described here itself. > > I see some comments use "effective_wal_level is logical" and some > mention "logical decoding is enabled" depending upon the context. I > think, it's important to differentiate between these two given that > the it's possible to find the system in a state where > effective_wal_level is 'logical' but logical decoding is disabled. But > as a first reader, this did cause me some confusion. Above paragraph > is a good place to make that distinction clear. The description of > step-by-step process will make things clearer. Agreed. Will update the comments while considering these differences. > > + /* cannot change while ReplicationSlotCtlLock is held */ > + if (!s->in_use) > + continue; > + > + /* NB: intentionally counting invalidated slots */ > > Explain the reason in the comment. Will add some comments to the function header comment. > > + > +# Cleanup all existing slots and start the concurrency test. > +$primary->safe_psql('postgres', > + qq[select pg_drop_replication_slot('test_slot')]); > > We should add effective wal level test. At a later point we have a > test for this but it would be good to make sure that the effective wal > level is replica at this stage in the test as well. > > test_wal_level($primary, "replica|replica", "effective_wal_level reset > to 'replica' after dropping all logical slots"); > > + > +# Wait for the logical slot 'test_slot' has been created. > > Should be "Wait for the logical slot 'test_slot' to be created" or > "Check that the logical slot 'test_slot' has been created". Will fix. > > +# Check if the standby's effective_wal_level should be 'logical' in spite > +# of wal_level being 'replica'. > +test_wal_level($standby1, "replica|logical", > + "effective_wal_level='logical' on standby"); > > Do we have a test to verify that a logical replication slot can not be > created on a standby whose primary does *not* have effective_wal_level > 'logical'? > I think no, so will add that test. > + > +# Create a logical slot on the standby, which should be succeeded > > grammar: ..., which should succeed OR better "Creating a logical slot > on standby should succeed". > > +# as the primary enables it. > > as the primary has logical decoding enabled. Will fix. > > +# Check if the logical decoding is not enabled on the standby4. > +test_wal_level($standby4, "logical|replica", > + "standby's effective_wal_level got decreased to 'replica'"); > +$standby4->safe_psql('postgres', > + qq[select pg_drop_replication_slot('standby4_slot')]); > + > > Instead of dropping the slot, if we create another slot on the > primary, what happens to the invalidated replication slot? Does it > remain invalidated? Have we covered this scenario in tests? I believe that the slot remains invalidated. Adding such a test case sounds good to me. > + > +done_testing(); > > What happens if we create a logical slot when wal_level is 'minimal'? > Do we have a test for that? I'll consider more tests involving wal_level='minimal'. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: