Here's a new version of this. Notable changes:
- I reverted the changes to ExtendMultiXactOffset(), so that it deals
with wraparound and FirstMultiXactId the same way as before. The caller
never passes FirstMultiXactId, but the changed comments and the
assertion were confusing, so I felt it's best to just leave it alone
- bunch of comment changes & other cosmetic changes
- I modified TrimMultiXact() to initialize the page corresponding to
'nextMulti', because if you just swapped the binary to the new one, and
nextMulti was at a page boundary, it would not be initialized yet.
If we want to backpatch this, and I think we need to because this fixes
real bugs, we need to think through all the upgrade scenarios. I made
the above-mentioned changes to TrimMultiXact(), but it doesn't fix all
the problems.
What happens if you replay the WAL generated with old binary, without
this patch, with new binary? It's not good:
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 0/01766A68
FATAL: could not access status of transaction 2048
DETAIL: Could not read from file "pg_multixact/offsets/0000" at offset
8192: read too few bytes.
CONTEXT: WAL redo at 0/05561030 for MultiXact/CREATE_ID: 2047 offset
4093 nmembers 2: 2830 (keysh) 2831 (keysh)
LOG: startup process (PID 3130184) exited with exit code 1
This is because the WAL, created with old version, contains records like
this:
lsn: 0/05561030, prev 0/05561008, desc: CREATE_ID 2047 offset 4093
nmembers 2: 2830 (keysh) 2831 (keysh)
lsn: 0/055611A8, prev 0/05561180, desc: ZERO_OFF_PAGE 1
lsn: 0/055611D0, prev 0/055611A8, desc: CREATE_ID 2048 offset 4095
nmembers 2: 2831 (keysh) 2832 (keysh)
When replaying that with the new version, replay of the CREATE_ID 2047
record tries to set the next multixid's offset, but the page hasn't been
initialized yet. With the new version, the ZERO_OFF_PAGE 1 record would
appear before the CREATE_ID 2047 record, but we can't change the WAL
that already exists.
- Heikki