Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date | |
Msg-id | CAA4eK1K0QUo-_cT__pdPUbwwab54Tnyg_3oOqOdudnwyhLyJmg@mail.gmail.com Whole thread Raw |
In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Thu, Sep 11, 2025 at 11:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Sep 10, 2025 at 11:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Sep 6, 2025 at 3:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I've attached the updated patch that incorporated all comments I got so far. > > > > > > > * > > + /* > > + * While all processes are using the new status, there could be some > > + * transactions that might have started with the old status. So wait > > + * for the running transactions to complete so that logical decoding > > + * doesn't include transactions that wrote WAL with insufficient > > + * information. > > + */ > > + running = GetRunningTransactionData(); > > + LWLockRelease(ProcArrayLock); > > + LWLockRelease(XidGenLock); > > + > > + elog(DEBUG1, "waiting for %d transactions to complete", running->xcnt); > > + > > + for (int i = 0; i < running->xcnt; i++) > > + { > > + TransactionId xid = running->xids[i]; > > + > > + if (TransactionIdIsCurrentTransactionId(xid)) > > + continue; > > + > > + XactLockTableWait(xid, NULL, NULL, XLTW_None); > > + } > > > > When building a snapshot during the start of logical decoding, we > > anyway wait for running transactions to finish via the snapbuild > > machinery. So, why do we need it here? And if it is needed, can we > > update the comments to explain why it is required in spite of > > snapbuild machinery doing similar thing? > > Fair point. I don't see any reason we need to wait here. Will remove this step. > We can add a comment there explaining why we don't wait for in-progress transactions. This will also be important if we miss anything and later need to handle it similarly. One thing related to this which needs a discussion is after this change, it is possible that part of the transaction contains additional logical_wal_info. I couldn't think of a problem due to this but users using pg_waldump or other WAL reading utilities could question this. One possibility is that we always start including logical_wal_info for the next new transaction but not sure if that is required. It would be good if other people involved in the discussion or otherwise could share their opinion on this point. > > * Is it a good idea to enable/disable decoding for temporary logical > > slots? The temporary slots are released during ERROR or at session > > end, is that a good time to do the disable processing that even > > requires WAL writing. > > I think the same is true for slots with RS_EPEMERAL state. Since it > could confuse users if automatic effective_wal_level change is > supported only for non-temporary slots, I personally would like not to > push aside temporary slots. I agree that it might not be a good time > to disable processing during process shutdown time; in addition to > requiring WAL record, it also requires waits for concurrent state > change processings while it holds all interrupts, which could easily > involve dead-locks. > Yes, all such processing during ERROR and shutdown sounds scary and a source for problems. > It might be worth considering doing the disable > process in a lazy way. For example, other processes (like > checkpointer) periodically checks the logical decoding status and > disables it if necessary. > Yeah, doing lazily sounds reasonable to me. We need to do lazily only for ERROR cases, otherwise, during a normal drop_slot, it may be okay. But OTOH, while dropping the slot as a part of subscription drop, it could be risky because if due to any reason, the disabling took more time, the subscription drop operation would look like hang or in worse case, the connection can time out. For the shutdown sequence, can't we think of resetting effective_wal after a restart? -- With Regards, Amit Kapila.
pgsql-hackers by date: