Re: [Patch] Windows relation extension failure at 2GB and 4GB - Mailing list pgsql-hackers

From Bryan Green
Subject Re: [Patch] Windows relation extension failure at 2GB and 4GB
Date
Msg-id 67f0dc0f-0fa2-4edc-a8a0-ab69a85acb34@gmail.com
Whole thread Raw
In response to Re: [Patch] Windows relation extension failure at 2GB and 4GB  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On 11/5/2025 11:05 PM, Michael Paquier wrote:
> On Tue, Oct 28, 2025 at 09:42:11AM -0500, Bryan Green wrote:
>> I found two related bugs in PostgreSQL's Windows port that prevent files
>> from exceeding 2GB. While unlikely to affect most installations (default 1GB
>> segments), the code is objectively wrong and worth fixing.
>>
>> The first bug is a pervasive use of off_t where pgoff_t should be used. On
>> Windows, off_t is only 32-bit, causing signed integer overflow at exactly
>> 2GB (2^31 bytes). PostgreSQL already defined pgoff_t as __int64 for this
>> purpose and some function declarations in headers already use it, but the
>> implementations weren't updated to match.
> 
> Ugh.  That's the same problem as "long", which is 8 bytes wide
> everywhere except WIN32.  Removing traces of "long" from the code has
> been a continuous effort over the years because of these silent
> overflow issues.
> 

Exactly - these silent overflows are particularly nasty since they only
manifest under specific conditions (large files on Windows) and can
cause data corruption rather than immediate crashes.

>> After fixing all those off_t issues, there's a second bug at 4GB in the
>> Windows implementations of pg_pwrite()/pg_pread() in win32pwrite.c and
>> win32pread.c. The current implementation uses an OVERLAPPED structure for
>> positioned I/O, but only sets the Offset field (low 32 bits), leaving
>> OffsetHigh at zero. This works up to 4GB by accident, but beyond that,
>> offsets wrap around.
>>
>> I can reproduce both bugs reliably with --with-segsize=8. The file grows to
>> exactly 2GB and fails with "could not extend file: Invalid argument" despite
>> having 300GB free. After fixing the off_t issues, it grows to exactly 4GB
>> and hits the OVERLAPPED bug. Both are independently verifiable.
> 
> The most popular option in terms of installation on Windows is the EDB
> installer, where I bet that a file segment size of 1GB is what's
> embedded in the code compiled.  This argument would not hold with WAL

Right, which is why this has gone unnoticed. The 1GB default masks both
bugs completely. It's only when someone uses --with-segsize > 2 that the
issues appear.

> segment sizes if we begin to support even higher sizes than 1GB at
> some point, and we use pwrite() in the WAL insert code.  That should
> not be a problem even in the near future.
> 
>> It's safe for all platforms since pgoff_t equals off_t on Unix where off_t
>> is already 64-bit. Only Windows behavior changes.
> 
> win32_port.h and port.h say so, yeah.
> 
>> Not urgent since few people hit this in practice, but it's clearly wrong
>> code.
>> Someone building with larger segments would see failures at 2GB and
>> potential corruption at 4GB. Windows supports files up to 16 exabytes - no
>> good reason to limit PostgreSQL to 2GB.
> 
> The same kind of limitations with 4GB files existed with stat() and
> fstat(), but it was much more complicated than what you are doing
> here, where COPY was not able to work with files larger than 4GB on
> WIN32.  See the saga from bed90759fcbc.
> 
>> I have attached the patch to fix the relation extension problems for Windows
>> to this email.
>>
>> Can provide the other patches that changes off_t for pgoff_t in the rest of
>> the code if there's interest in fixing this.
> 
> Yeah, I think that we should rip out these issues, and move to the
> more portable pgoff_t across the board.  I doubt that any of this
> could be backpatched due to the potential ABI breakages these
> signatures changes would cause.  Implementing things in incremental 
> steps is more sensible when it comes to such changes, as a revert
> blast can be reduced if a portion is incorrectly handled.
> 
> I'm seeing as well the things you are pointing in buffile.c.  These
> should be fixed as well.  The WAL code is less annoying due to the 1GB
> WAL segment size limit, still consistency across the board makes the
> code easier to reason about, at least.
> 
> Among the files you have mentioned, there is also copydir.c.
> 
> pg_rewind seems also broken with files larger than 4GB, from what I
> can see in libpq_source.c and filemap.c..  Worse.  Oops.
> 
> -    /* Note that this changes the file position, despite not using it. */
> -    overlapped.Offset = offset;
> +    overlapped.Offset = (DWORD) offset;
> +    overlapped.OffsetHigh = (DWORD) (offset >> 32);
> 
> Based on the docs at [1], yes, this change makes sense.
> 
> It seems to me that a couple of extra code paths should be handled in
> the first patch, and I have spotted three of them.  None of them are
> critical as they are related to WAL segments, just become masked and
> inconsistent:
> - xlogrecovery.c, pg_pread() called with a cast to off_t.  WAL
> segments have a max size of 1GB, meaning that we're OK.
> - xlogreader.c, pg_pread() with a cast to off_t.
> - walreceiver.c, pg_pwrite().
> 

I'll include these in the first patch for consistency even though
they're not currently problematic. Better to fix all the function call
sites together rather than leaving known inconsistencies.

> Except for these three spots, the first patch looks like a cut good
> enough on its own.
> 
> Glad to see someone who takes time to spend time on this kind of
> stuff with portability in mind, by the way.
> 

Windows portability issues tend to hide in corners like this. I'll
prepare the updated patch series addressing your feedback and post v2
shortly.

> [1]: https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-overlapped
> --
> Michael

Thanks for taking the time to look over this patch.

-- 
Bryan Green
EDB: https://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Consistently use the XLogRecPtrIsInvalid() macro
Next
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Assertion failure in SnapBuildInitialSnapshot()