Re: Order of operations in SubPostmasterMain() - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Order of operations in SubPostmasterMain() |
Date | |
Msg-id | 20160929203515.hftq6tslmduosnl2@alap3.anarazel.de Whole thread Raw |
In response to | Order of operations in SubPostmasterMain() (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On 2016-09-29 15:46:00 -0400, Tom Lane wrote: > I noticed that buildfarm member culicidae, which is running an > EXEC_BACKEND build on Linux, occasionally falls over like this: > > FATAL: could not reattach to shared memory (key=6280001, addr=0x7fa9df845000): Invalid argument > > That's probably because Andres failed to disable ASLR on that machine, Intentionally so, FWIW. Want to enable PIE as well soon-ish. Newer versions of windows / msvc want to enable ASLR by default, and I think that'll break parallel query bgworkers, because they pass the entry point around as a pointer. So I don't think we'll be able to duck this issue for much longer. > but the exact cause isn't too important right now. What I'm finding > distressing is that that message doesn't show up on the client side, > making it look like a postmaster crash: Indeed. > The reason we don't see anything is that backend libpq hasn't been > initialized yet, so it won't send the ereport message to the client. > > The original design of SubPostmasterMain() intended that such cases > would be reported, because it runs BackendInitialize (which calls > pq_init()) before CreateSharedMemoryAndSemaphores (which is where > the reattach used to happen --- the comments at postmaster.c 4730ff > and 4747ff reflect this expectation). We broke it when we added > the earlier reattach call at lines 4669ff. > > We could probably refactor things enough so that we do pq_init() > before PGSharedMemoryReAttach(). It would be a little bit ugly, > and it would fractionally increase the chance of a reattach failure > because pq_init() palloc's a few KB worth of buffers. I'm not quite > sure if it's worth it; thoughts? In any case the mentioned comments > are obsolete and need to be moved/rewritten. Hm. It doesn't really seem necessary to directly allocate that much. Seems pretty harmless to essentially let it start at 0 bytes (so we can repalloc, but don't use a lot of memory)? > Another thing that I'm finding distressing while I look at this is > the placement of ClosePostmasterPorts(). That needs to happen *early*, > because as long as the child process is holding open the postmaster side > of the postmaster death pipe, we are risking unhappiness. Not only is > it not particularly early, it's after process_shared_preload_libraries() > which could invoke arbitrarily stupid stuff. Whether or not we do > anything about moving pq_init(), I'm thinking we'd better move the > ClosePostmasterPorts() call so that it's done as soon as possible, > probably right after read_backend_variables() which loads the data > it needs. That seems like a good idea. Besides the benefit you mention, it looks like that'll also reduce duplication. Greetings, Andres Freund
pgsql-hackers by date: