Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns - Mailing list pgsql-hackers
| From | Kyotaro Horiguchi |
|---|---|
| Subject | Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns |
| Date | |
| Msg-id | 20211008.165055.1621145185927268721.horikyota.ntt@gmail.com Whole thread Raw |
| In response to | Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns (Masahiko Sawada <sawada.mshk@gmail.com>) |
| Responses |
Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
|
| List | pgsql-hackers |
At Thu, 7 Oct 2021 13:20:14 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
> Another idea to fix this problem would be that before calling
> SnapBuildCommitTxn() we create transaction entries in ReorderBuffer
> for (sub)transactions whose COMMIT record has XACT_XINFO_HAS_INVALS,
> and then mark all of them as catalog-changed by calling
> ReorderBufferXidSetCatalogChanges(). I've attached a PoC patch for
> this idea. What the patch does is essentially the same as what the
> proposed patch does. But the patch doesn't modify the
> SnapBuildCommitTxn(). And we remember the list of last running
> transactions in reorder buffer and the list is periodically purged
> during decoding RUNNING_XACTS records, eventually making it empty.
I came up with the third way. SnapBuildCommitTxn already properly
handles the case where a ReorderBufferTXN with
RBTXN_HAS_CATALOG_CHANGES. So this issue can be resolved by create
such ReorderBufferTXNs in SnapBuildProcessRunningXacts.
One problem with this is that change creates the case where multiple
ReorderBufferTXNs share the same first_lsn. I haven't come up with a
clean idea to avoid relaxing the restriction of AssertTXNLsnOrder..
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 46e66608cf..503116764f 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -887,9 +887,14 @@ AssertTXNLsnOrder(ReorderBuffer *rb)
if (cur_txn->end_lsn != InvalidXLogRecPtr)
Assert(cur_txn->first_lsn <= cur_txn->end_lsn);
- /* Current initial LSN must be strictly higher than previous */
+ /*
+ * Current initial LSN must be strictly higher than previous. except
+ * this transaction is created by XLOG_RUNNING_XACTS. If one
+ * XLOG_RUNNING_XACTS creates multiple transactions, they share the
+ * same LSN. See SnapBuildProcessRunningXacts.
+ */
if (prev_first_lsn != InvalidXLogRecPtr)
- Assert(prev_first_lsn < cur_txn->first_lsn);
+ Assert(prev_first_lsn <= cur_txn->first_lsn);
/* known-as-subtxn txns must not be listed */
Assert(!rbtxn_is_known_subxact(cur_txn));
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a5333349a8..58859112dc 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1097,6 +1097,20 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
*/
if (builder->state < SNAPBUILD_CONSISTENT)
{
+ /*
+ * At the time we passed the first XLOG_RUNNING_XACTS record, the
+ * transactions notified by the record may have updated
+ * catalogs. Register the transactions with marking them as having
+ * caused catalog changes. The worst misbehavior here is some spurious
+ * invalidation at decoding start.
+ */
+ if (builder->state == SNAPBUILD_START)
+ {
+ for (int i = 0 ; i < running->xcnt + running->subxcnt ; i++)
+ ReorderBufferXidSetCatalogChanges(builder->reorder,
+ running->xids[i], lsn);
+ }
+
/* returns false if there's no point in performing cleanup just yet */
if (!SnapBuildFindSnapshot(builder, lsn, running))
return;
pgsql-hackers by date: