Re: Replace O_EXCL with O_TRUNC for creation of state.tmp in SaveSlotToPath - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Replace O_EXCL with O_TRUNC for creation of state.tmp in SaveSlotToPath
Date
Msg-id aOhY1O7deWq1Fs2T@paquier.xyz
Whole thread Raw
In response to Re: Replace O_EXCL with O_TRUNC for creation of state.tmp in SaveSlotToPath  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Thu, Oct 09, 2025 at 05:02:12PM +0900, Michael Paquier wrote:
> An alternative fix that we can do here instead is to unlink() the
> temporary file when reaching on these error code paths, allowing
> future accesses to work correctly.  This was suggested as a second
> solution, other than the O_TRUNC objected to.  One thing is to make
> sure that the unlinks are done while holding the lwlock for the IO in
> progress.  So, something like the attached should also solve your
> problem.

I have been playing a bit more with that, and applied 912af1c7e9c9 to
do the unlink() calls down to v13.  While on it, I have also played
with hard crashes timed while we are in the middle of SaveSlotToPath()
with state.tmp still around at restart (injection point wait just
before the rename(), for example), and double-checked that recovery is
able to do a correct cleanup job.  I didn't fully recall this last
part, as it has been a couple of years since the last report.

There may be an argument for having an automated test, like:
- Physical slot creation.
- Use a bit the slot.
- Injection point wait before the rename() of SaveSlotToPath().
- Checkpoint, that would not be able to finish.
- Hard crash.
- Restart, check that the slot state is correct after recovery.

However, I am not sure that this would be worth the cycles spent on.
There are a lot more interesting scenarios for write failures in the
tree than this one.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: Support getrandom() for pg_strong_random() source
Next
From: Chao Li
Date:
Subject: Re: speedup COPY TO for partitioned table.