Re: [Patch] Windows relation extension failure at 2GB and 4GB - Mailing list pgsql-hackers

From Bryan Green
Subject Re: [Patch] Windows relation extension failure at 2GB and 4GB
Date
Msg-id 246ff974-8599-4c06-b5d9-7763f12ad9ca@gmail.com
Whole thread Raw
In response to Re: [Patch] Windows relation extension failure at 2GB and 4GB  (Michael Paquier <michael@paquier.xyz>)
Responses Re: [Patch] Windows relation extension failure at 2GB and 4GB
Re: [Patch] Windows relation extension failure at 2GB and 4GB
List pgsql-hackers
On 11/5/2025 11:05 PM, Michael Paquier wrote:
> On Tue, Oct 28, 2025 at 09:42:11AM -0500, Bryan Green wrote:
>> I found two related bugs in PostgreSQL's Windows port that prevent files
>> from exceeding 2GB. While unlikely to affect most installations (default 1GB
>> segments), the code is objectively wrong and worth fixing.
>>
>> The first bug is a pervasive use of off_t where pgoff_t should be used. On
>> Windows, off_t is only 32-bit, causing signed integer overflow at exactly
>> 2GB (2^31 bytes). PostgreSQL already defined pgoff_t as __int64 for this
>> purpose and some function declarations in headers already use it, but the
>> implementations weren't updated to match.
> 
> Ugh.  That's the same problem as "long", which is 8 bytes wide
> everywhere except WIN32.  Removing traces of "long" from the code has
> been a continuous effort over the years because of these silent
> overflow issues.
> 
>> After fixing all those off_t issues, there's a second bug at 4GB in the
>> Windows implementations of pg_pwrite()/pg_pread() in win32pwrite.c and
>> win32pread.c. The current implementation uses an OVERLAPPED structure for
>> positioned I/O, but only sets the Offset field (low 32 bits), leaving
>> OffsetHigh at zero. This works up to 4GB by accident, but beyond that,
>> offsets wrap around.
>>
>> I can reproduce both bugs reliably with --with-segsize=8. The file grows to
>> exactly 2GB and fails with "could not extend file: Invalid argument" despite
>> having 300GB free. After fixing the off_t issues, it grows to exactly 4GB
>> and hits the OVERLAPPED bug. Both are independently verifiable.
> 
> The most popular option in terms of installation on Windows is the EDB
> installer, where I bet that a file segment size of 1GB is what's
> embedded in the code compiled.  This argument would not hold with WAL
> segment sizes if we begin to support even higher sizes than 1GB at
> some point, and we use pwrite() in the WAL insert code.  That should
> not be a problem even in the near future.
> 
>> It's safe for all platforms since pgoff_t equals off_t on Unix where off_t
>> is already 64-bit. Only Windows behavior changes.
> 
> win32_port.h and port.h say so, yeah.
> 
>> Not urgent since few people hit this in practice, but it's clearly wrong
>> code.
>> Someone building with larger segments would see failures at 2GB and
>> potential corruption at 4GB. Windows supports files up to 16 exabytes - no
>> good reason to limit PostgreSQL to 2GB.
> 
> The same kind of limitations with 4GB files existed with stat() and
> fstat(), but it was much more complicated than what you are doing
> here, where COPY was not able to work with files larger than 4GB on
> WIN32.  See the saga from bed90759fcbc.
> 
>> I have attached the patch to fix the relation extension problems for Windows
>> to this email.
>>
>> Can provide the other patches that changes off_t for pgoff_t in the rest of
>> the code if there's interest in fixing this.
> 
> Yeah, I think that we should rip out these issues, and move to the
> more portable pgoff_t across the board.  I doubt that any of this
> could be backpatched due to the potential ABI breakages these
> signatures changes would cause.  Implementing things in incremental 
> steps is more sensible when it comes to such changes, as a revert
> blast can be reduced if a portion is incorrectly handled.
> 
> I'm seeing as well the things you are pointing in buffile.c.  These
> should be fixed as well.  The WAL code is less annoying due to the 1GB
> WAL segment size limit, still consistency across the board makes the
> code easier to reason about, at least.
> 
> Among the files you have mentioned, there is also copydir.c.
> 
> pg_rewind seems also broken with files larger than 4GB, from what I
> can see in libpq_source.c and filemap.c..  Worse.  Oops.
> 
> -    /* Note that this changes the file position, despite not using it. */
> -    overlapped.Offset = offset;
> +    overlapped.Offset = (DWORD) offset;
> +    overlapped.OffsetHigh = (DWORD) (offset >> 32);
> 
> Based on the docs at [1], yes, this change makes sense.
> 
> It seems to me that a couple of extra code paths should be handled in
> the first patch, and I have spotted three of them.  None of them are
> critical as they are related to WAL segments, just become masked and
> inconsistent:
> - xlogrecovery.c, pg_pread() called with a cast to off_t.  WAL
> segments have a max size of 1GB, meaning that we're OK.
> - xlogreader.c, pg_pread() with a cast to off_t.
> - walreceiver.c, pg_pwrite().
> 
> Except for these three spots, the first patch looks like a cut good
> enough on its own.
> 

Latest patch attached that includes these code paths.

> Glad to see someone who takes time to spend time on this kind of
> stuff with portability in mind, by the way.
> 
> [1]: https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-overlapped
> --
> Michael

Thanks for the quick reviewing.

-- 
Bryan Green
EDB: https://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: Álvaro Herrera
Date:
Subject: Re: warning on the current head
Next
From: Andres Freund
Date:
Subject: Re: [Patch] Windows relation extension failure at 2GB and 4GB