On Tue, Sep 30, 2025 at 05:21:05PM +0530, Kevin K Biju wrote:
> We have encountered a few instances where logical replication errors out
> during SaveSlotToPath() after creating the state.tmp file, but before it
> was renamed (due to ENOSPC, for example). In these cases, since state.tmp
> is not cleaned up and is created with the O_EXCL flag, further invocations
> of SaveSlotToPath() for this slot will error out on OpenTransientFile()
> with EEXIST, completely blocking slot metadata persistence. The only
> explicit cleanup for state.tmp occurs during server startup as part of
> RestoreSlotFromDisk().
Ah, you are referring to the window between a CloseTransientFile()
completing and the rename().
It's not the first time this report pops up. I have found two
references, for the same error as yours, with one referring to a
discussion about O_EXCL vs O_TRUNC:
https://www.postgresql.org/message-id/08bbfab1-a61d-3750-fc18-4ab2c1aa7f09@postgrespro.ru
https://www.postgresql.org/message-id/3559061693910326@qy4q4a6esb2lebnz.sas.yp-c.yandex.net
> It doesn't seem that this function relies on data written to state.tmp
> previously, so O_EXCL is unnecessary. Attaching a patch that swaps O_EXCL
> for O_TRUNC, ensuring a fresh state.tmp is available for writing.
Using O_TRUNC has been discussed and discarded because O_EXCL is more
protective in this specific code path, see the argument here:
https://www.postgresql.org/message-id/20191202161222.sazl2omhhk5pl3nl@alap3.anarazel.de
An alternative fix that we can do here instead is to unlink() the
temporary file when reaching on these error code paths, allowing
future accesses to work correctly. This was suggested as a second
solution, other than the O_TRUNC objected to. One thing is to make
sure that the unlinks are done while holding the lwlock for the IO in
progress. So, something like the attached should also solve your
problem.
Any thoughts or comments from the others? I'd like to backpatch that
all the way down, 6 years too late. But later is better than never,
right?
--
Michael