On Tue, Nov 4, 2025 at 4:12 AM Bryan Green <dbryan.green@gmail.com> wrote:
> Current approaches (inherited event handles, shared memory flags) depend
> on the postmaster running code during exit. A segfault or kill bypasses
> all of that.
Huh. I thought PostmasterHandle should be signalled by the kernel,
not by any code run by the postmaster itself, when taskkill /f
calls something like TerminateProcess(). Is that not the case? Are
you sure we haven't broken something on our side?
> My proposed solution is to use Windows Job Objects with KILL_ON_JOB_CLOSE.
I'm not a Windows guy but I had been researching this independently,
and it does seem to be the standard approach, so +1 generally even
though I'm still very curious to know *why* the existing coding
doesn't work in such a simple case.
By coincidence, I'm getting close to posting a stack of patches for
better postmaster death handling on Unix too, along with better
subprocess and interrupt multiplexing and cleanup. That does more
stuff, with connections to multithreading, interrupts, critical
sections, a bunch of existing bugs we have with subprocess management
on Unix. I will gladly delete my own attempt at Windows job objects
from that effort and rebase on top of your patch, and see what review
feedback ideas come up in that process.
In nearby threads that triggered my work on that, I was a bit worried
about the change in behaviour on PM death in the syncrep and
syslogger, but I'm beginning to suspect that the vast majority of
Linux/systemd deployments probably just nuke the whole cgroup from
orbit in this case, so it seems like exceptional behaviour is really
just a recipe for fragility on rarer systems, not to mention that it
probably has to be like that in a potential multithreaded mode anyway.
We should probably just rip all such specialness out, which I'll show
in my patch set with some explanations soon.
On the other hand, I was thinking of this as a v19 feature, where one
can contemplate such changes, but you said:
> Job creation can fail if postgres runs under an existing job (service
> managers, debuggers). Windows 7 disallows nested jobs. We detect this
> with IsProcessInJob(), and if AssignProcessToJobObject() returns
> ERROR_ACCESS_DENIED, we log and continue without orphan protection.
We currently require Windows 10 (itself recently EOL'd), but
PostgreSQL 13-15 sort of claim to work on Windows all the way back to
7, so I'm guessing you're imagining back-patching this?
> This patch does not include automated tests because the core
> functionality (orphan prevention on crash) requires simulating process
> termination, which is difficult to test reliably in CI.
Ah yes I ran into problems with that part too...