Re: POC: make mxidoff 64 bits - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: POC: make mxidoff 64 bits
Date
Msg-id bc7a77a9-4158-418e-8aa7-788e926c96d9@iki.fi
Whole thread Raw
In response to Re: POC: make mxidoff 64 bits  (Maxim Orlov <orlovmg@gmail.com>)
Responses Re: POC: make mxidoff 64 bits
List pgsql-hackers
On 30/10/2025 18:17, Maxim Orlov wrote:
> PFA v20 returns to using actual 64-bit offsets for on-disk SLRU
> segments.
>
> Fortunately, now that I've separated reading and writing offsets into
> different functions, switching from one implementation to another is
> easy to do.
>
> Here's a quick overview of the current state of the patch:
> 1) Access to the offset is placed to separate calls:
>     MXOffsetWrite/MXOffsetRead.
> 2) I abandoned byte juggling in pg_upgrade and moved to using logic that
>     replicates the work with offsets im multixact.c
> 3) As a result, the update issue came down to the correct implementation
>     of functions MXOffsetWrite/MXOffsetRead.
> 4) The only question that remains is the question of disk representation
>     of 64-bit offsets in SLRU segments.

Here's another round of review and cleanup of this. I made a bunch of
small changes, but haven't found any major problems. Looking pretty good.

Notable changes since v20:

- Changed MULTIXACT_MEMBER_AUTOVAC_THRESHOLD to 4000000000 instead of
0xFFFFFFFF. The 2^32 mark doesn't have any particular meaning
significance and using a round number makes that more clear. We should
possibly expose that as a separate GUC, but that can be done in a
followup patch.

- Removed the MXOffsetRead/Write functions again. They certainly make
sense if we make them more complicated with some kind of a compression
scheme, but I preferred to keep the code closer to 'master' for now.

- Removed more remnants of offset wraparound handling. There were still
a few places that checked for wraparound and tried to deal with it,
while other places just assumed that it doesn't happen. I added a check
in GetNewMultiXactId() to error out if it would wrap around. It really
should not happen in the real world, one reason being that we would run
out of WAL before running out of 64-bit mxoffsets, but a sanity check
won't hurt just in case someone e.g. abuses pg_resetwal to get into that
situation.

- Removed MaybeExtendOffsetSlru(). It was only used to deal with binary
upgrade from version 9.2 and below. Now that pg_upgrade rewrites the
files, it's not needed anymore.

- Modified PerformMembersTruncation() to use SimpleLruTruncate()

Changes in pg_upgrade:

- Removed the nextMXact/nextOffset fields from MultiXactWriter. They
were redundant with the next_multi and next_offset local variables in
the caller.

Remaining issues:

- There's one more refactoring I'd like to do before merging this: Move
the definitions that are now duplicated between
src/bin/pg_upgrade/multixact_new.c and
src/backend/access/transam/multixact.c into a new header file,
multixact_internal.h. One complication with that is that it needs
SLRU_PAGES_PER_SEGMENT from access/slru.h, but slru.h cannot currently
be included in FRONTEND code. Perhaps we should move
SLRU_PAGES_PER_SEGMENT to pg_config_manual.h, or if that feels too
global, to a separate slru_config.h file.

- I saw Alexander's proposal for a new compression scheme but didn't
incorporate that here. It might be a good idea, but I think we can do
that as a followup patch before the release, if it seems worth it. I
don't feel too bad about just making pg_multixact/offsets 2x larger either.

- Have you done any performance testing of the pg_upgrade code? How long
does the conversion take if you have e.g. 1 billion multixids?

- Is the !oldestOffsetKnown case in the code still reachable? I left one
FIXME comment about that. Needs a comment update at least.

- The new pg_upgrade test fails on my system with this error in the log:

> # Running: pg_dump --no-sync --restrict-key test -d port=22462 host=/tmp/5KdMvth1jk dbname='postgres' -f
/home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/007_mxoff/data/tmp_test_CINS/newnode_1_dump.sql
> pg_dump: error: aborting because of server version mismatch
> pg_dump: detail: server version: 19devel; pg_dump version: 17.6
> could not read
"/home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/007_mxoff/data/tmp_test_CINS/newnode_1_dump.sql":No such
fileor directory at /home/heikki/git-sandbox/postgresql/src/bin/pg_upgrade/t/007_mxoff.pl line 242. 

This turns out to be an issue with IPC::Run. Setting the
IPCRUNDEBUG=basic env variable reveals that it has a built-in command cache:

> IPC::Run 0004 [#19(109223)]: ****** harnessing *****
> IPC::Run 0004 [#19(109223)]: parsing [ pg_dump --no-sync --restrict-key test -d 'port=20999 host=/tmp/NsJKldN1Ie
dbname='postgres''-f
'/home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/007_mxoff/data/tmp_test_urgw/newnode_1_dump.sql'] 
> IPC::Run 0004 [#19(109223)]: ** starting
> IPC::Run 0004 [#19(109223)]: 'pg_dump' found in cache: '/home/heikki/pgsql.17stable/bin/pg_dump'
> IPC::Run 0004 [#19(111432) pg_dump]: execing /home/heikki/pgsql.17stable/bin/pg_dump --no-sync --restrict-key test -d
'port=20999host=/tmp/NsJKldN1Ie dbname='postgres'' -f
/home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/007_mxoff/data/tmp_test_urgw/newnode_1_dump.sql
> IPC::Run 0004 [#19(109223)]: ** finishing
> pg_dump: error: aborting because of server version mismatch
> pg_dump: detail: server version: 19devel; pg_dump version: 17.6

The test calls pg_dump twice: first with the old version, then with the
new version. But thanks to IPC::Run's command cache, the invocation of
the new pg_dump version actually also calls the old version. I'm not
sure how to fix that, but I was able to work around it by reversing the
pg_dump calls so that thew new version is called first. That way we use
the new pg_dump against both server versions which works.

- The new pg_ugprade test is very slow. I would love to include that
test permanently in the test suite, but it's too slow for that currently.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Suggestion to add --continue-client-on-abort option to pgbench
Next
From: Peter Eisentraut
Date:
Subject: Re: SQL:2011 Application Time Update & Delete