Re: [Patch] Windows relation extension failure at 2GB and 4GB - Mailing list pgsql-hackers
| From | Bryan Green |
|---|---|
| Subject | Re: [Patch] Windows relation extension failure at 2GB and 4GB |
| Date | |
| Msg-id | 246ff974-8599-4c06-b5d9-7763f12ad9ca@gmail.com Whole thread Raw |
| In response to | Re: [Patch] Windows relation extension failure at 2GB and 4GB (Michael Paquier <michael@paquier.xyz>) |
| Responses |
Re: [Patch] Windows relation extension failure at 2GB and 4GB
Re: [Patch] Windows relation extension failure at 2GB and 4GB |
| List | pgsql-hackers |
On 11/5/2025 11:05 PM, Michael Paquier wrote: > On Tue, Oct 28, 2025 at 09:42:11AM -0500, Bryan Green wrote: >> I found two related bugs in PostgreSQL's Windows port that prevent files >> from exceeding 2GB. While unlikely to affect most installations (default 1GB >> segments), the code is objectively wrong and worth fixing. >> >> The first bug is a pervasive use of off_t where pgoff_t should be used. On >> Windows, off_t is only 32-bit, causing signed integer overflow at exactly >> 2GB (2^31 bytes). PostgreSQL already defined pgoff_t as __int64 for this >> purpose and some function declarations in headers already use it, but the >> implementations weren't updated to match. > > Ugh. That's the same problem as "long", which is 8 bytes wide > everywhere except WIN32. Removing traces of "long" from the code has > been a continuous effort over the years because of these silent > overflow issues. > >> After fixing all those off_t issues, there's a second bug at 4GB in the >> Windows implementations of pg_pwrite()/pg_pread() in win32pwrite.c and >> win32pread.c. The current implementation uses an OVERLAPPED structure for >> positioned I/O, but only sets the Offset field (low 32 bits), leaving >> OffsetHigh at zero. This works up to 4GB by accident, but beyond that, >> offsets wrap around. >> >> I can reproduce both bugs reliably with --with-segsize=8. The file grows to >> exactly 2GB and fails with "could not extend file: Invalid argument" despite >> having 300GB free. After fixing the off_t issues, it grows to exactly 4GB >> and hits the OVERLAPPED bug. Both are independently verifiable. > > The most popular option in terms of installation on Windows is the EDB > installer, where I bet that a file segment size of 1GB is what's > embedded in the code compiled. This argument would not hold with WAL > segment sizes if we begin to support even higher sizes than 1GB at > some point, and we use pwrite() in the WAL insert code. That should > not be a problem even in the near future. > >> It's safe for all platforms since pgoff_t equals off_t on Unix where off_t >> is already 64-bit. Only Windows behavior changes. > > win32_port.h and port.h say so, yeah. > >> Not urgent since few people hit this in practice, but it's clearly wrong >> code. >> Someone building with larger segments would see failures at 2GB and >> potential corruption at 4GB. Windows supports files up to 16 exabytes - no >> good reason to limit PostgreSQL to 2GB. > > The same kind of limitations with 4GB files existed with stat() and > fstat(), but it was much more complicated than what you are doing > here, where COPY was not able to work with files larger than 4GB on > WIN32. See the saga from bed90759fcbc. > >> I have attached the patch to fix the relation extension problems for Windows >> to this email. >> >> Can provide the other patches that changes off_t for pgoff_t in the rest of >> the code if there's interest in fixing this. > > Yeah, I think that we should rip out these issues, and move to the > more portable pgoff_t across the board. I doubt that any of this > could be backpatched due to the potential ABI breakages these > signatures changes would cause. Implementing things in incremental > steps is more sensible when it comes to such changes, as a revert > blast can be reduced if a portion is incorrectly handled. > > I'm seeing as well the things you are pointing in buffile.c. These > should be fixed as well. The WAL code is less annoying due to the 1GB > WAL segment size limit, still consistency across the board makes the > code easier to reason about, at least. > > Among the files you have mentioned, there is also copydir.c. > > pg_rewind seems also broken with files larger than 4GB, from what I > can see in libpq_source.c and filemap.c.. Worse. Oops. > > - /* Note that this changes the file position, despite not using it. */ > - overlapped.Offset = offset; > + overlapped.Offset = (DWORD) offset; > + overlapped.OffsetHigh = (DWORD) (offset >> 32); > > Based on the docs at [1], yes, this change makes sense. > > It seems to me that a couple of extra code paths should be handled in > the first patch, and I have spotted three of them. None of them are > critical as they are related to WAL segments, just become masked and > inconsistent: > - xlogrecovery.c, pg_pread() called with a cast to off_t. WAL > segments have a max size of 1GB, meaning that we're OK. > - xlogreader.c, pg_pread() with a cast to off_t. > - walreceiver.c, pg_pwrite(). > > Except for these three spots, the first patch looks like a cut good > enough on its own. > Latest patch attached that includes these code paths. > Glad to see someone who takes time to spend time on this kind of > stuff with portability in mind, by the way. > > [1]: https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-overlapped > -- > Michael Thanks for the quick reviewing. -- Bryan Green EDB: https://www.enterprisedb.com
Attachment
pgsql-hackers by date: