Re: [Patch] Windows relation extension failure at 2GB and 4GB - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [Patch] Windows relation extension failure at 2GB and 4GB
Date
Msg-id aQwsi2IfG0imIwe-@paquier.xyz
Whole thread Raw
In response to [Patch] Windows relation extension failure at 2GB and 4GB  (Bryan Green <dbryan.green@gmail.com>)
Responses Re: [Patch] Windows relation extension failure at 2GB and 4GB
List pgsql-hackers
On Tue, Oct 28, 2025 at 09:42:11AM -0500, Bryan Green wrote:
> I found two related bugs in PostgreSQL's Windows port that prevent files
> from exceeding 2GB. While unlikely to affect most installations (default 1GB
> segments), the code is objectively wrong and worth fixing.
>
> The first bug is a pervasive use of off_t where pgoff_t should be used. On
> Windows, off_t is only 32-bit, causing signed integer overflow at exactly
> 2GB (2^31 bytes). PostgreSQL already defined pgoff_t as __int64 for this
> purpose and some function declarations in headers already use it, but the
> implementations weren't updated to match.

Ugh.  That's the same problem as "long", which is 8 bytes wide
everywhere except WIN32.  Removing traces of "long" from the code has
been a continuous effort over the years because of these silent
overflow issues.

> After fixing all those off_t issues, there's a second bug at 4GB in the
> Windows implementations of pg_pwrite()/pg_pread() in win32pwrite.c and
> win32pread.c. The current implementation uses an OVERLAPPED structure for
> positioned I/O, but only sets the Offset field (low 32 bits), leaving
> OffsetHigh at zero. This works up to 4GB by accident, but beyond that,
> offsets wrap around.
>
> I can reproduce both bugs reliably with --with-segsize=8. The file grows to
> exactly 2GB and fails with "could not extend file: Invalid argument" despite
> having 300GB free. After fixing the off_t issues, it grows to exactly 4GB
> and hits the OVERLAPPED bug. Both are independently verifiable.

The most popular option in terms of installation on Windows is the EDB
installer, where I bet that a file segment size of 1GB is what's
embedded in the code compiled.  This argument would not hold with WAL
segment sizes if we begin to support even higher sizes than 1GB at
some point, and we use pwrite() in the WAL insert code.  That should
not be a problem even in the near future.

> It's safe for all platforms since pgoff_t equals off_t on Unix where off_t
> is already 64-bit. Only Windows behavior changes.

win32_port.h and port.h say so, yeah.

> Not urgent since few people hit this in practice, but it's clearly wrong
> code.
> Someone building with larger segments would see failures at 2GB and
> potential corruption at 4GB. Windows supports files up to 16 exabytes - no
> good reason to limit PostgreSQL to 2GB.

The same kind of limitations with 4GB files existed with stat() and
fstat(), but it was much more complicated than what you are doing
here, where COPY was not able to work with files larger than 4GB on
WIN32.  See the saga from bed90759fcbc.

> I have attached the patch to fix the relation extension problems for Windows
> to this email.
>
> Can provide the other patches that changes off_t for pgoff_t in the rest of
> the code if there's interest in fixing this.

Yeah, I think that we should rip out these issues, and move to the
more portable pgoff_t across the board.  I doubt that any of this
could be backpatched due to the potential ABI breakages these
signatures changes would cause.  Implementing things in incremental
steps is more sensible when it comes to such changes, as a revert
blast can be reduced if a portion is incorrectly handled.

I'm seeing as well the things you are pointing in buffile.c.  These
should be fixed as well.  The WAL code is less annoying due to the 1GB
WAL segment size limit, still consistency across the board makes the
code easier to reason about, at least.

Among the files you have mentioned, there is also copydir.c.

pg_rewind seems also broken with files larger than 4GB, from what I
can see in libpq_source.c and filemap.c..  Worse.  Oops.

-    /* Note that this changes the file position, despite not using it. */
-    overlapped.Offset = offset;
+    overlapped.Offset = (DWORD) offset;
+    overlapped.OffsetHigh = (DWORD) (offset >> 32);

Based on the docs at [1], yes, this change makes sense.

It seems to me that a couple of extra code paths should be handled in
the first patch, and I have spotted three of them.  None of them are
critical as they are related to WAL segments, just become masked and
inconsistent:
- xlogrecovery.c, pg_pread() called with a cast to off_t.  WAL
segments have a max size of 1GB, meaning that we're OK.
- xlogreader.c, pg_pread() with a cast to off_t.
- walreceiver.c, pg_pwrite().

Except for these three spots, the first patch looks like a cut good
enough on its own.

Glad to see someone who takes time to spend time on this kind of
stuff with portability in mind, by the way.

[1]: https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-overlapped
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: index prefetching
Next
From: Bryan Green
Date:
Subject: [PATCH] Fix socket handle inheritance on Windows