Re: IPC/MultixactCreation on the Standby server - Mailing list pgsql-hackers

From Chao Li
Subject Re: IPC/MultixactCreation on the Standby server
Date
Msg-id 5F9FB408-847B-457D-B01E-0F634767D881@gmail.com
Whole thread Raw
In response to Re: IPC/MultixactCreation on the Standby server  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: IPC/MultixactCreation on the Standby server
List pgsql-hackers
Hi Heikki,

I just reviewed this patch. As offset[x+1] anyway equals offset[x]+nmembers, pre-write offset[x+1] seems a very clever
solution.

I got a few questions/comments as below:

> On Nov 27, 2025, at 04:59, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> Here's a new version of this. Notable changes:
>
> - I reverted the changes to ExtendMultiXactOffset(), so that it deals with wraparound and FirstMultiXactId the same
wayas before. The caller never passes FirstMultiXactId, but the changed comments and the assertion were confusing, so I
feltit's best to just leave it alone 
>
> - bunch of comment changes & other cosmetic changes
>
> - I modified TrimMultiXact() to initialize the page corresponding to 'nextMulti', because if you just swapped the
binaryto the new one, and nextMulti was at a page boundary, it would not be initialized yet. 
>
> If we want to backpatch this, and I think we need to because this fixes real bugs, we need to think through all the
upgradescenarios. I made the above-mentioned changes to TrimMultiXact(), but it doesn't fix all the problems. 
>
> What happens if you replay the WAL generated with old binary, without this patch, with new binary? It's not good:
>
> LOG:  database system was not properly shut down; automatic recovery in progress
> LOG:  redo starts at 0/01766A68
> FATAL:  could not access status of transaction 2048
> DETAIL:  Could not read from file "pg_multixact/offsets/0000" at offset 8192: read too few bytes.
> CONTEXT:  WAL redo at 0/05561030 for MultiXact/CREATE_ID: 2047 offset 4093 nmembers 2: 2830 (keysh) 2831 (keysh)
> LOG:  startup process (PID 3130184) exited with exit code 1
>
> This is because the WAL, created with old version, contains records like this:
>
> lsn: 0/05561030, prev 0/05561008, desc: CREATE_ID 2047 offset 4093 nmembers 2: 2830 (keysh) 2831 (keysh)
> lsn: 0/055611A8, prev 0/05561180, desc: ZERO_OFF_PAGE 1
> lsn: 0/055611D0, prev 0/055611A8, desc: CREATE_ID 2048 offset 4095 nmembers 2: 2831 (keysh) 2832 (keysh)
>
>
> When replaying that with the new version, replay of the CREATE_ID 2047 record tries to set the next multixid's
offset,but the page hasn't been initialized yet. With the new version, the ZERO_OFF_PAGE 1 record would appear before
theCREATE_ID 2047 record, but we can't change the WAL that already exists. 
>
> - Heikki
> <v11-0001-Avoid-multixact-edge-case-2-by-writing-the-next-.patch>

1
```
+    if (*offptr != offset)
+    {
+        /* should already be set to the correct value, or not at all */
+        Assert(*offptr == 0);
+        *offptr = offset;
+        MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
+    }
```

This is a more like a question. Since pre-write should always happen, in theory *offptr != offset should never be true,
whydo we still need to handle the case instead of just assert(false)? 

2
```
+    next = multi + 1;
+    if (next < FirstMultiXactId)
+        next = FirstMultiXactId;
```

next < FirstMultiXactId will only be true when next wraps around to 0, maybe deserve one-line comment to explain that.

3
```
+    if (*next_offptr != offset + nmembers)
+    {
+        /* should already be set to the correct value, or not at all */
+        Assert(*next_offptr == 0);
+        *next_offptr = offset + nmembers;
+        MultiXactMemberCtl->shared->page_dirty[slotno] = true;
+    }
```

Should MultiXactMemberCtl be MultiXactOffsetCtl? As we are writing to the offset SLRU.

4
```
+# Another multixact test: loosing some multixact must not affect reading near
```

I guess “loosing” should be “losing”

5
```
+# multitransaction (to avaoid corener case 1).
```

Typo: avoid corner case

6
```
+# Verify thet recorded multi is readble, this call must not hang.
```

Typo: readable

7
```
+# One more multitransaction to effectivelt emit WAL record about next
```

Typo: effectivelt -> effectively

8
```
+    plan skip_all => 'Kill9 works unpredicatably on Windows’;
```

Typo: unpredicatably, no “a” after “c"


Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Consistently use palloc_object() and palloc_array()
Next
From: Pavel Stehule
Date:
Subject: Re: [PATCH] psql: add size-based sorting options (O/o) for tables and indexes