Re: [BUG] [PATCH] pg_basebackup produces wrong incremental files after relation truncation in segmented tables - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [BUG] [PATCH] pg_basebackup produces wrong incremental files after relation truncation in segmented tables
Date
Msg-id CA+TgmobgoOHjjHzK1WXnBSR7p2o7VxC1S6cfdEY2cFSKaekfxA@mail.gmail.com
Whole thread Raw
In response to Re: [BUG] [PATCH] pg_basebackup produces wrong incremental files after relation truncation in segmented tables  (Oleg Tkachenko <oatkachenko@gmail.com>)
List pgsql-hackers
On Wed, Jan 7, 2026 at 9:50 AM Oleg Tkachenko <oatkachenko@gmail.com> wrote:
> Both forks have the same limit, which looks wrong.
> So I checked the WAL files to see what really happened with the VM fork.
> I did not find any “truncate" records for the VM file.
> I only found this record for the main fork
> (actually, the fork isn’t mentioned at all):
>
> rmgr: Storage  len (rec/tot): 46/46, tx: 759, lsn: 0/4600D318,
>  prev 0/4600B2C8, desc: TRUNCATE base/5/16384 to 131073 blocks flags 7

Flags 7 for Storage/TRUNCATE means all forks:

#define SMGR_TRUNCATE_HEAP              0x0001
#define SMGR_TRUNCATE_VM                0x0002
#define SMGR_TRUNCATE_FSM               0x0004
#define SMGR_TRUNCATE_ALL               \
        (SMGR_TRUNCATE_HEAP|SMGR_TRUNCATE_VM|SMGR_TRUNCATE_FSM)

I think this comes from RelationTruncate(), which does indeed set
xlrec.flags = SMGR_TRUNCATE_ALL. It seems bananas to me to use the
same count of blocks for all forks, but it seems that is the way the
code treats it. SmgrTruncate() goes on to do
smgrtruncate(RelationGetSmgr(rel), forks, nforks, old_blocks, blocks)
which iterates over all forks and uses the same block number for all
of them, smgr_redo() also does this, and SummarizeSmgrRecord() also
calls BlockRefTableSetLimitBlock() for each relevant fork with that
same block number. This really makes no sense to me unless the block
count happens to be zero, but AFAICT all the code agrees that this is
how it's supposed to work.

I think the problem here is that the incremental backup code makes the
apparently-naive assumption that the purpose of truncation is to make
things shorter. In this case, all forks were truncated to a random
length that was well in excess of the length of the VM fork, and in
pg_combinebackup, find_reconstructed_block_length() interprets that to
mean that the output file should be at least as long as the truncation
length. I am at present uncertain whether that can be safely changed
without breaking anything else. I don't think that what we're doing is
unsafe in the sense of producing corrupted data, because a bunch of
trailing blocks of zeroes are harmless, but it's obviously potentially
pretty problematic if it causes a huge disk space blowup as it did
here. So I think something should be done about this, but I think the
original issue you reported is more urgent.

So my suggestion is to change the test so that it produces a file that
is the same small size on every platform. On most platforms, this will
be 1 segment. On the CI platform where we set the segment size to 6,
it will be multiple segments, and on that platform only it will
effectively test for this bug. If you do that, then we can commit the
fix for the original problem. We (or someone else) can then look into
what needs to address the excessive zero-padding as a separate issue.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Segmentation fault on proc exit after dshash_find_or_insert
Next
From: Álvaro Herrera
Date:
Subject: Re: Add IS_INDEX macro to brin and gist index