Re: [HACKERS] Unportable implementation of background worker start - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [HACKERS] Unportable implementation of background worker start |
Date | |
Msg-id | 20170421172044.zcyslu7kcu7tfrsv@alap3.anarazel.de Whole thread Raw |
In response to | Re: [HACKERS] Unportable implementation of background worker start (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] Unportable implementation of background worker start
|
List | pgsql-hackers |
Hi, On 2017-04-21 12:50:04 -0400, Tom Lane wrote: > Attached is a lightly-tested draft patch that converts the postmaster to > use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions > about whether this is the direction to proceed, though. It adds at least > a couple of kernel calls per postmaster signal delivery, and probably to > every postmaster connection acceptance (ServerLoop iteration), to fix > problems that are so corner-casey that we've never even known we had them > till now. I'm not concerned much about the signal delivery paths, and I can't quite imagine that another syscall in the accept path is going to be measurable - worth ensuring though. I do agree that it's a bit of a big stick for the back-branches... > A different line of thought is to try to provide a bulletproof solution, > but push the implementation problems down into latch.c --- that is, the > goal would be to provide a pselect-like variant of WaitEventSetWait that > is guaranteed to return if interrupted, as opposed to the current behavior > where it's guaranteed not to. But that seems like quite a bit of work. Seems like a sane idea to me. The use of latches has already grown due to parallelism and it'll likely grow further - some of them seem likely to also have to deal with such concerns. I'd much rather centralize things down to a common place. On the other hand most types of our processes do SetLatch() in just nearly all the relevant signal handlers anyway, so they're pretty close to behaviour already. > Whether or not we decide to change over the postmaster.c code, I think > it'd likely be a good idea to apply most or all of the attached changes > in latch.c. Setting CLOEXEC on the relevant FDs is clearly a good thing, > and the other changes will provide some safety if some preloaded extension > decides to create a latch in the postmaster process. On the principle, I agree. Reading through the changes now. > @@ -1667,76 +1656,64 @@ ServerLoop(void) > * do nontrivial work. > * > * If we are in PM_WAIT_DEAD_END state, then we don't want to accept > - * any new connections, so we don't call select(), and just sleep. > + * any new connections, so we don't call WaitEventSetWait(), and just > + * sleep. XXX not ideal > */ Couldn't we just deactive the sockets in the set instead? > /* > - * Initialise the masks for select() for the ports we are listening on. > - * Return the number of sockets to listen on. > + * Create a WaitEventSet for ServerLoop() to wait on. This includes the > + * sockets we are listening on, plus the PostmasterLatch. > */ > -static int > -initMasks(fd_set *rmask) > +static void > +initServerWaitSet(void) > { > - int maxsock = -1; > int i; > > - FD_ZERO(rmask); > + ServerWaitSet = CreateWaitEventSet(PostmasterContext, MAXLISTEN + 1); Why are we using MAXLISTEN, rather than the actual number of things to listen to? The only benefit of this seems to be that we could theoretically allow dynamic reconfiguration of the sockets a bit more easily in the future, but that could just as well be done by recreating the set. Random note: Do we actually have any code that errors out if too many sockets are being listened to? > @@ -2553,6 +2543,9 @@ SIGHUP_handler(SIGNAL_ARGS) > #endif > } > > + /* Force ServerLoop to iterate */ > + SetLatch(&PostmasterLatch); > + > PG_SETMASK(&UnBlockSig); > > errno = save_errno; > @@ -2724,6 +2717,9 @@ pmdie(SIGNAL_ARGS) > break; > } > > + /* Force ServerLoop to iterate */ > + SetLatch(&PostmasterLatch); > + > PG_SETMASK(&UnBlockSig); > > errno = save_errno; > @@ -3037,6 +3033,9 @@ reaper(SIGNAL_ARGS) > */ > PostmasterStateMachine(); > > + /* Force ServerLoop to iterate */ > + SetLatch(&PostmasterLatch); > + > /* Done with signal handler */ > PG_SETMASK(&UnBlockSig); > > @@ -5078,6 +5077,9 @@ sigusr1_handler(SIGNAL_ARGS) > signal_child(StartupPID, SIGUSR2); > } > > + /* Force ServerLoop to iterate */ > + SetLatch(&PostmasterLatch); > + > PG_SETMASK(&UnBlockSig); > > errno = save_errno; I kind of would like, in master, take a chance of replace all the work done in signal handlers, by just a SetLatch(), and do it outside of signal handlers instead. Forking from signal handlers is just plain weird. > diff --git a/src/backend/storage/ipc/latchindex 4798370..92d2ff0 100644 > --- a/src/backend/storage/ipc/latch.c > +++ b/src/backend/storage/ipc/latch.c > @@ -62,6 +62,10 @@ > #include "storage/pmsignal.h" > #include "storage/shmem.h" > > +#ifndef FD_CLOEXEC > +#define FD_CLOEXEC 1 > +#endif Hm? Are we sure this is portable? Is there really cases that have F_SETFD, but not CLOEXEC? Greetings, Andres Freund
pgsql-hackers by date: