Re: Using WaitEventSet in the postmaster - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Using WaitEventSet in the postmaster |
Date | |
Msg-id | CA+hUKGKt58vOw8Y32Cy_Lnj8K2FGrty6-XO+bmkRnyuz5w-wJw@mail.gmail.com Whole thread Raw |
In response to | Re: Using WaitEventSet in the postmaster (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Using WaitEventSet in the postmaster
|
List | pgsql-hackers |
On Fri, Dec 23, 2022 at 8:46 PM Thomas Munro <thomas.munro@gmail.com> wrote: > I pushed the small preliminary patches. Here's a rebase of the main patch. Here are some questions I have considered. Anyone got an opinion on point 3, in particular? 1. Is it OK that we are now using APIs that might throw, in places where we weren't? I think so: we don't really expect WaitEventSet APIs to throw unless something is pretty seriously wrong, and if you hack things to inject failures there then you get a FATAL error and the postmaster exits and the children detect that. I think that is appropriate. 2. Is it really OK to delete the pqsignal_pm() infrastructure? I think so. The need for sa_mask to block all signals is gone: all signal handlers should now be re-entrant (ie safe in case of interruption while already in a signal handler), safe against stack overflow (pqsignal() still blocks re-entry for the *same* signal number, because we use sigaction() without SA_NODEFER, so a handler can only be interrupted by a different signal, and the number of actions installed is finite and small), and safe to run at any time (ie safe to interrupt the user context because we just do known-good sigatomic_t/syscall stuff and save/restore errno). The concern about SA_RESTART is gone, because we no longer use the underspecified select() interface; the replacement implementation syscalls, even poll(), return with EINTR for handlers installed with SA_RESTART, but that's now moot anyway because we have a latch that guarantees they return with a different event anyway. (FTR select() is nearly extinct in BE code, I found one other user and I plan to remove it, see RADIUS thread, CF #4103.) 3. Is it OK to clobber the shared pending flag for SIGQUIT, SIGTERM, SIGINT? If you send all of these extremely rapidly, it's indeterminate which one will be seen by handle_shutdown_request(). I think that's probably acceptable? To be strict about processing only the first one that is delivered, I think you'd need an sa_mask to block all three signals, and then you wouldn't change pending_shutdown_request if it's already set, which I'm willing to code up if someone thinks that's important. (<vapourware>Ideally I would invent WL_SIGNAL to consume signal events serially without handlers or global variables</vapourware>.) 4. Is anything new leaking into child processes due to this new infrastructure? I don't think so; the postmaster's MemoryContext is destroyed, and before that I'm releasing kernel resources on OSes that need it (namely Linux, where the epoll fd and signalfd need to be closed). 5. Is the signal mask being correctly handled during forking? I think so: I decided to push the masking logic directly into the routine that forks, to make it easy to verify that all paths set the mask the way we want. (While thinking about that I noticed that signals don't seem to be initially masked on Windows; I think that's a pre-existing condition, and I assume we get away with it because nothing reaches the fake signal dispatch code soon enough to break anything? Not changing that in this patch.) 6. Is the naming and style OK? Better ideas welcome, but basically I tried to avoid all unnecessary refactoring and changes, so no real logic moves around, and the changes are pretty close to "mechanical". One bikeshed decision was what to call the {handle,process}_XXX functions and associated flags. Maybe "action" isn't the best name; but it could be a request from pg_ctl or a request from a child process. I went with newly invented names for these handlers rather than "handle_SIGUSR1" etc because (1) the 3 different shutdown request signals point to a common handler and (2) I hope to switch to latches instead of SIGUSR1 for "action" in later work. But I could switch to got_SIGUSR1 style variables if someone thinks it's better. Here's a new version, with small changes: * remove a stray reference to select() in a pqcomm.c comment * move PG_SETMASK(&UnBlockSig) below the bit that sets up SIGTTIN etc * pgindent
Attachment
pgsql-hackers by date: