Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
| From | Masahiko Sawada |
|---|---|
| Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Date | |
| Msg-id | CAD21AoBfMrE++TgsOmNCq1Lo7wpNr36cm0CoTb-OhLVv52mxdQ@mail.gmail.com Whole thread Raw |
| In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Amit Kapila <amit.kapila16@gmail.com>) |
| Responses |
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| List | pgsql-hackers |
On Wed, Nov 26, 2025 at 3:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 26, 2025 at 10:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Nov 26, 2025 at 5:59 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > >
> > > After thinking of this case, I'm concerned that we would hit the
> > > existing similar assertion failures or assertions that future changes
> > > could introduce because the value returned by XLogLogicalInfoActive()
> > > could be changed even within the same transaction whereas the result
> > > from XLogStandbyInfoActive() doesn't. One possible change would be to
> > > ensure that XLogLogicalInfoActive() returns the same result within the
> > > same transaction. It would be more straightforward and closer to the
> > > current GUC-based behavior.
> > >
> >
> > Agreed that this is worth considering. So, the only downside it could
> > have is the performance impact where it is not used like when
> > wal_level is replica and the user didn't create any logical slot. So,
> > short transactions where we do call XLogLogicalInfoActive() directly
> > or indirectly only once or twice, we always need to invoke
> > CheckXLogLogicalInfoSlow(). Can we try some workloads to see if there
> > is any noticeable impact?
> >
>
> One simple test could be: Begin; Select pg_current_xact_id(); Commit;
> This will let newly added function twice in each transaction.
I've done a simple performance benchmark with and without this patch.
There seems to be no noticeable performance regression.
w/o patch:
transaction type: /tmp/test.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 180 s
number of transactions actually processed: 3913028
number of failed transactions: 0 (0.000%)
latency average = 0.046 ms
latency stddev = 0.012 ms
initial connection time = 2.077 ms
tps = 21739.289857 (without initial connection time)
w/ patch:
transaction type: /tmp/test.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 180 s
number of transactions actually processed: 3922125
number of failed transactions: 0 (0.000%)
latency average = 0.046 ms
latency stddev = 0.012 ms
initial connection time = 2.049 ms
tps = 21789.823021 (without initial connection time)
>
> Few comments on 0001:
> =====================
> 1.
> <literal>wal_level_insufficient</literal> means that the
> - primary doesn't have a <xref linkend="guc-wal-level"/> sufficient to
> - perform logical decoding. It is set only for logical slots.
> + logical decoding is disabled on the primary (See
> + <xref linkend="logicaldecoding-explanation-log-dec"/>). It is set
> + only for logical slots.
>
> Can't we use the existing description and use effective-wal-level
> instead of wal-level? We can provide the link you mentioned.
Agreed.
>
> 2.
> * Skip this if we're taking a full-page image of the new page, as we
> * don't include the new tuple in the WAL record in that case. Also
> - * disable if wal_level='logical', as logical decoding needs to be able to
> - * read the new tuple in whole from the WAL record alone.
> + * disable if logical decoding is enabled and the relation requires WAL to
> + * be logged for logical decoding, as it needs to be able to read the new
> + * tuple in whole from the WAL record alone.
> */
> if (oldbuf == newbuf && !need_tuple_data &&
> !XLogCheckBufferNeedsBackup(newbuf))
> @@ -9064,8 +9065,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
> /*
> * Perform XLogInsert of an XLOG_HEAP2_NEW_CID record
> *
> - * This is only used in wal_level >= WAL_LEVEL_LOGICAL, and only for catalog
> - * tuples.
> + * This is only used when effective_wal_level is logical, and only for
> + * catalog tuples.
>
> In the first comment, you have used "logical decoding is enabled," and
> in the second one, you have used "effective_wal_level is logical." Can
> we effective_wal_level is logical" for consistency sake, and it sounds
> a bit easier to follow?
Agreed.
>
> 3.
> Check the shared
> + * status instead of the local XLogLogicalInfo because XLogLogicalInfo is
> + * not updated synchronously during recovery.
> + */
> + if (RecoveryInProgress())
> + return IsXLogLogicalInfoEnabled() ? "logical" : "replica";
>
> Is it true only during recovery or in general also? Say a checkpointer
> disables it, then won't backends see it asynchronously?
I meant that we don't update the process-local cache, XLogLogicalInfo,
upon replaying the STATUS_CHANGE record during recovery, so we need to
check the shared flag instead. During non-recovery, I think that
processes should check its local cache because they're working based
on their cache.
>
> 4.
> + errhint("Before creating subscriptions, set \"wal_level\" >= \"replica\"")));
>
> errhint should always end with a full stop. Also, how about a slightly
> modified hint message like: "Before creating subscriptions, ensure
> that wal_level is set to replica or higher.”?
>
> 5.
> This design choice exists because deactivation requires waiting
> + * for concurrent attempts to update the logical decoding status, which can be
> + * problematic when the process is holding interrupts.
>
> The part of the comment that says "concurrent attempts to update the
> logical decoding status" is not clear to me. Which concurrent attempts
> are you referring to here?
Fixed the above points.
I've squashed all fixup patches and attached the updated patch.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: