Re: [HACKERS] Instability in select_parallel regression test - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [HACKERS] Instability in select_parallel regression test |
Date | |
Msg-id | CAA4eK1JzXHzEzWwH811wPZH39+EegepJ=tui7Baw4aSvhW57Og@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Instability in select_parallel regression test (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] Instability in select_parallel regression test
|
List | pgsql-hackers |
On Fri, Feb 17, 2017 at 9:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> On Fri, Feb 17, 2017 at 11:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> In short, it looks to me like ExecShutdownGatherWorkers doesn't actually >>> wait for parallel workers to finish (as its comment suggests is >>> necessary), so that on not-too-speedy machines the worker slots may all >>> still be in use when the next command wants some. > >> ExecShutdownGatherWorkers() do wait for workers to exit/finish, but it >> doesn't wait for the postmaster to free the used slots and that is how >> that API is supposed to work. There is good chance that on slow >> machines the slots get freed up much later by postmaster after the >> workers have exited. > > That seems like a seriously broken design to me, first because it can make > for a significant delay in the slots becoming available (which is what's > evidently causing these regression failures), and second because it's > simply bad design to load extra responsibilities onto the postmaster. > Especially ones that involve touching shared memory. > > I think this needs to be changed, and promptly. Why in the world don't > you simply have the workers clearing their slots when they exit? > There seem to be many reasons why exit of background workers is done by postmaster like when they have to restart after a crash or if someone terminates them (TerminateBackgroundWorker()) or if the notification has to be sent at exit (bgw_notify_pid). Moreover, there can be background workers without shared memory access as well which can't clear state from shared memory. Another point to think is that background workers contain user-supplied code, so not sure how good it is to give them control of tinkering with shared data structures of the backend. Now, one can argue that for some of the cases where it is possible background worker should cleanup shared memory state at the exit, but I think it is not directly related to the parallel query. As far as the parallel query is concerned, it just needs to wait for workers exit to ensure that no more operations can be performed in workers after that point so that it can accumulate stats and retrieve some fixed parallel state information. > We don't have an expectation that regular backends are incompetent to > clean up after themselves. (Obviously, a crash exit is a different > case.) > >> I think what we need to do >> here is to move the test that needs workers to execute before other >> parallel query tests where there is no such requirement. > > That's not fixing the problem, it's merely averting your eyes from > the symptom. > I am not sure why you think so. Parallel query is capable of running without workers even if the number of planned workers are not available and there are many reasons for same. In general, I think the tests should not rely on the availability of background workers and if there is a test like that then it should be the responsibility of the test to ensure the same. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: