Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns |
Date | |
Msg-id | 20211008.165055.1621145185927268721.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
|
List | pgsql-hackers |
At Thu, 7 Oct 2021 13:20:14 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in > Another idea to fix this problem would be that before calling > SnapBuildCommitTxn() we create transaction entries in ReorderBuffer > for (sub)transactions whose COMMIT record has XACT_XINFO_HAS_INVALS, > and then mark all of them as catalog-changed by calling > ReorderBufferXidSetCatalogChanges(). I've attached a PoC patch for > this idea. What the patch does is essentially the same as what the > proposed patch does. But the patch doesn't modify the > SnapBuildCommitTxn(). And we remember the list of last running > transactions in reorder buffer and the list is periodically purged > during decoding RUNNING_XACTS records, eventually making it empty. I came up with the third way. SnapBuildCommitTxn already properly handles the case where a ReorderBufferTXN with RBTXN_HAS_CATALOG_CHANGES. So this issue can be resolved by create such ReorderBufferTXNs in SnapBuildProcessRunningXacts. One problem with this is that change creates the case where multiple ReorderBufferTXNs share the same first_lsn. I haven't come up with a clean idea to avoid relaxing the restriction of AssertTXNLsnOrder.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 46e66608cf..503116764f 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -887,9 +887,14 @@ AssertTXNLsnOrder(ReorderBuffer *rb) if (cur_txn->end_lsn != InvalidXLogRecPtr) Assert(cur_txn->first_lsn <= cur_txn->end_lsn); - /* Current initial LSN must be strictly higher than previous */ + /* + * Current initial LSN must be strictly higher than previous. except + * this transaction is created by XLOG_RUNNING_XACTS. If one + * XLOG_RUNNING_XACTS creates multiple transactions, they share the + * same LSN. See SnapBuildProcessRunningXacts. + */ if (prev_first_lsn != InvalidXLogRecPtr) - Assert(prev_first_lsn < cur_txn->first_lsn); + Assert(prev_first_lsn <= cur_txn->first_lsn); /* known-as-subtxn txns must not be listed */ Assert(!rbtxn_is_known_subxact(cur_txn)); diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index a5333349a8..58859112dc 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -1097,6 +1097,20 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact */ if (builder->state < SNAPBUILD_CONSISTENT) { + /* + * At the time we passed the first XLOG_RUNNING_XACTS record, the + * transactions notified by the record may have updated + * catalogs. Register the transactions with marking them as having + * caused catalog changes. The worst misbehavior here is some spurious + * invalidation at decoding start. + */ + if (builder->state == SNAPBUILD_START) + { + for (int i = 0 ; i < running->xcnt + running->subxcnt ; i++) + ReorderBufferXidSetCatalogChanges(builder->reorder, + running->xids[i], lsn); + } + /* returns false if there's no point in performing cleanup just yet */ if (!SnapBuildFindSnapshot(builder, lsn, running)) return;
pgsql-hackers by date: