Thread: es_query_dsa is broken
Hi hackers, While reviewing commit c6755e23 I realised that es_query_dsa is broken. It might have made some kind of sense as a name and a concept in an earlier version of the proposal to add a DSA area for parallel query's use, when the DSA area was going to be independent of parallel query DSM segments and could have been used for the whole query. But as it stands, each Gather [Merge] node has its own DSA area in its own DSM segment, and that needs to be the one available to the nodes of the child plan when executed in the leader process and it needs to be not available to any other nodes in the tree. It's OK for the workers to have just one es_query_dsa set up at the beginning and used for the whole lifetime of the executor, but it's not OK for the leader to install it and forget about it as it does now. The attached draft patch (for discussion only) shows one way to fix that: only install it for the duration of Gather[Merge]'s ExecProcNode(child), instead of doing it in ExecInitParallelPlan(). Also for the [Re]IntializeDSM invocations. This type of install-call-clear coding isn't ideal, but unfortunately the area doesn't exist until the first time through ExecGather runs, and we don't seem to have another suitable scope (ie some object easily accessible to all children of a given Gather) to put it in right now. Better ideas? -- Thomas Munro http://www.enterprisedb.com
Attachment
On Wed, Nov 29, 2017 at 7:30 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > While reviewing commit c6755e23 I realised that es_query_dsa is > broken. It might have made some kind of sense as a name and a concept > in an earlier version of the proposal to add a DSA area for parallel > query's use, when the DSA area was going to be independent of parallel > query DSM segments and could have been used for the whole query. But > as it stands, each Gather [Merge] node has its own DSA area in its own > DSM segment, and that needs to be the one available to the nodes of > the child plan when executed in the leader process and it needs to be > not available to any other nodes in the tree. It's OK for the workers > to have just one es_query_dsa set up at the beginning and used for the > whole lifetime of the executor, but it's not OK for the leader to > install it and forget about it as it does now. Ugh. > Better ideas? How about this: 1. Remove es_query_dsa altogether. 2. Add a dsa_area * to ExecParallelInitializeDSMContext. 3. In ExecParallelInitializeDSM, pass the dsa_area * as a separate to the per-node-type function. 4. If the per-node-type function cares about the dsa_area *, it is responsible for saving a pointer to it in the PlanState node. 5. Also pass it down via the ParallelWorkerContext. In v10 we might need to go with a solution like what you've sketched here, because Tom will complain about breaking binary compatibility with EState (and maybe other PlanState nodes) in a released branch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Nov 30, 2017 at 4:01 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Better ideas? > > How about this: > > 1. Remove es_query_dsa altogether. > 2. Add a dsa_area * to ExecParallelInitializeDSMContext. > 3. In ExecParallelInitializeDSM, pass the dsa_area * as a separate to > the per-node-type function. > 4. If the per-node-type function cares about the dsa_area *, it is > responsible for saving a pointer to it in the PlanState node. > 5. Also pass it down via the ParallelWorkerContext. > > In v10 we might need to go with a solution like what you've sketched > here, because Tom will complain about breaking binary compatibility > with EState (and maybe other PlanState nodes) in a released branch. I will post both versions. I've been stuck for a while now trying to come up with a query that actually breaks, so I can show that it's fixed... -- Thomas Munro http://www.enterprisedb.com
On Thu, Nov 30, 2017 at 11:19 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Nov 30, 2017 at 4:01 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> Better ideas? >> >> How about this: >> >> 1. Remove es_query_dsa altogether. >> 2. Add a dsa_area * to ExecParallelInitializeDSMContext. >> 3. In ExecParallelInitializeDSM, pass the dsa_area * as a separate to >> the per-node-type function. >> 4. If the per-node-type function cares about the dsa_area *, it is >> responsible for saving a pointer to it in the PlanState node. >> 5. Also pass it down via the ParallelWorkerContext. >> >> In v10 we might need to go with a solution like what you've sketched >> here, because Tom will complain about breaking binary compatibility >> with EState (and maybe other PlanState nodes) in a released branch. Here's a pair of versions of that patch tidied up a bit, for REL_10_STABLE and master. Unfortunately I've been unable to come up with a reproducer that shows misbehaviour (something like what was reported on -bugs[1] which appears to be a case of calling dsa_get_address(wrong_area, some_dsa_pointer)). I don't yet have a patch to remove es_query_dsa. Instead of adding a dsa_area * parameter to a ton of per-node functions as you suggested, I'm wondering if we shouldn't just pass the whole ParallelExecutorInfo object into all ExecXXXInitializeDSM() and ExecXXXInitializeWorker() functions so they can hold onto it if they want to. In the worker case it'd be only partially filled out: just area and pcxt, and pcxt would also be only partially filled out. Alternatively I could create a new struct ParallelExecutorWorkerInfo which has just the bit needed for workers (that'd replace ParallelWorkerContext, which I now realise was put in the wrong place -- it doesn't belong in access/parallel.h, it belongs in an executor header with an executor-sounding name). I'd like to get a couple of other things done before I come back to this. [1] https://www.postgresql.org/message-id/CAEepm=1V-ZjAAyG-xj6s7ESXFhzLsRBpyX=BLz-w2DS=ObNu4A@mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
Attachment
On Tue, Dec 5, 2017 at 6:45 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Nov 30, 2017 at 11:19 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Thu, Nov 30, 2017 at 4:01 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> >>> In v10 we might need to go with a solution like what you've sketched >>> here, because Tom will complain about breaking binary compatibility >>> with EState (and maybe other PlanState nodes) in a released branch. > > Here's a pair of versions of that patch tidied up a bit, for > REL_10_STABLE and master. Unfortunately I've been unable to come up > with a reproducer that shows misbehaviour (something like what was > reported on -bugs[1] which appears to be a case of calling > dsa_get_address(wrong_area, some_dsa_pointer)). > + EState *estate = gatherstate->ps.state; + + /* Install our DSA area while executing the plan. */ + estate->es_query_dsa = gatherstate->pei->area; outerTupleSlot = ExecProcNode(outerPlan); + estate->es_query_dsa = NULL; Won't the above coding pattern create a problem, if ExecProcNode throws an error and outer block catches it and continues execution (consider the case of execution inside PL blocks)? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Dec 5, 2017 at 7:24 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > + EState *estate = gatherstate->ps.state; > + > + /* Install our DSA area while executing the plan. */ > + estate->es_query_dsa = gatherstate->pei->area; > outerTupleSlot = ExecProcNode(outerPlan); > + estate->es_query_dsa = NULL; > > Won't the above coding pattern create a problem, if ExecProcNode > throws an error and outer block catches it and continues execution > (consider the case of execution inside PL blocks)? I don't see what the problem is. The query that got aborted by the error wouldn't be sharing an EState with one that didn't. Control would have to return to someplace inside the same ExecProcNode and it would have to return normally. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Dec 6, 2017 at 1:14 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Dec 5, 2017 at 7:24 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> + EState *estate = gatherstate->ps.state; >> + >> + /* Install our DSA area while executing the plan. */ >> + estate->es_query_dsa = gatherstate->pei->area; >> outerTupleSlot = ExecProcNode(outerPlan); >> + estate->es_query_dsa = NULL; >> >> Won't the above coding pattern create a problem, if ExecProcNode >> throws an error and outer block catches it and continues execution >> (consider the case of execution inside PL blocks)? > > I don't see what the problem is. The query that got aborted by the > error wouldn't be sharing an EState with one that didn't. > That's right. Ignore my comment, I got confused. Other than that, I don't see any problem with the code as such apart from that it looks slightly hacky. I think Thomas or someone needs to develop a patch on the lines you have mentioned or what Thomas was trying to describe in his email and see how it comes out. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 6, 2017 at 10:16 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Dec 6, 2017 at 1:14 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Dec 5, 2017 at 7:24 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> + EState *estate = gatherstate->ps.state; >>> + >>> + /* Install our DSA area while executing the plan. */ >>> + estate->es_query_dsa = gatherstate->pei->area; >>> outerTupleSlot = ExecProcNode(outerPlan); >>> + estate->es_query_dsa = NULL; >>> >>> Won't the above coding pattern create a problem, if ExecProcNode >>> throws an error and outer block catches it and continues execution >>> (consider the case of execution inside PL blocks)? >> >> I don't see what the problem is. The query that got aborted by the >> error wouldn't be sharing an EState with one that didn't. > > That's right. Ignore my comment, I got confused. Other than that, I > don't see any problem with the code as such apart from that it looks > slightly hacky. I think Thomas or someone needs to develop a patch on > the lines you have mentioned or what Thomas was trying to describe in > his email and see how it comes out. Yeah, it is a bit hacky, but I can't see a another way to fix it without changing released APIs and it's only for one release and will certainly be unhackified in v11. For v11 I think we need to decide between: 1. Removing es_query_dsa and injecting the right context into the executor tree as discussed. 2. Another idea mentioned by Robert in an off-list chat: We could consolidate all DSM segments in a multi-gather plan into one. See the nearby thread where someone had over a hundred Gather nodes and had to crank up max_connections to reserve enough DSM slots. Of course, optimising for that case doesn't make too much sense (I suspect multi-gather plans will become less likely with Parallel Append and Parallel Hash in the picture anyway), but it would reduce a bunch of duplicated work we do when it happens as well as scarce slot consumption. If we did that, then all of a sudden es_query_dsa would make sense again (ie it'd be whole-query scoped), and we could revert that hacky change. Or we could do both things anyway. -- Thomas Munro http://www.enterprisedb.com
On Thu, Dec 7, 2017 at 12:51 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Dec 6, 2017 at 10:16 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Wed, Dec 6, 2017 at 1:14 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Tue, Dec 5, 2017 at 7:24 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>> + EState *estate = gatherstate->ps.state; >>>> + >>>> + /* Install our DSA area while executing the plan. */ >>>> + estate->es_query_dsa = gatherstate->pei->area; >>>> outerTupleSlot = ExecProcNode(outerPlan); >>>> + estate->es_query_dsa = NULL; >>>> >>>> Won't the above coding pattern create a problem, if ExecProcNode >>>> throws an error and outer block catches it and continues execution >>>> (consider the case of execution inside PL blocks)? >>> >>> I don't see what the problem is. The query that got aborted by the >>> error wouldn't be sharing an EState with one that didn't. >> >> That's right. Ignore my comment, I got confused. Other than that, I >> don't see any problem with the code as such apart from that it looks >> slightly hacky. I think Thomas or someone needs to develop a patch on >> the lines you have mentioned or what Thomas was trying to describe in >> his email and see how it comes out. Andreas Seltenreich found a query[1] that suffers from this bug (thanks!), and Amit confirmed that this patch fixes it, but further sqlsmith fuzz testing with the patch applied also revealed that it failed to handle the case where pei is NULL because parallelism was inhibited. Here's a new version to fix that. Someone might prefer a coding with "if" rather than the ternary operator, but I didn't have a strong preference. [1] https://www.postgresql.org/message-id/flat/CAEepm%3D2tfz1XNDcyU_a5ZiEaopz6%2BbBo9vQY%2BiJVLTtUVNztcQ%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
Attachment
On Sun, Dec 17, 2017 at 7:35 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Andreas Seltenreich found a query[1] that suffers from this bug > (thanks!), and Amit confirmed that this patch fixes it, but further > sqlsmith fuzz testing with the patch applied also revealed that it > failed to handle the case where pei is NULL because parallelism was > inhibited. Here's a new version to fix that. Someone might prefer a > coding with "if" rather than the ternary operator, but I didn't have a > strong preference. Looks OK to me. Committed and back-patched to v10. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2017-12-07 12:51:56 +1300, Thomas Munro wrote: > 1. Removing es_query_dsa and injecting the right context into the > executor tree as discussed. > > 2. Another idea mentioned by Robert in an off-list chat: We could > consolidate all DSM segments in a multi-gather plan into one. See the > nearby thread where someone had over a hundred Gather nodes and had to > crank up max_connections to reserve enough DSM slots. Of course, > optimising for that case doesn't make too much sense (I suspect > multi-gather plans will become less likely with Parallel Append and > Parallel Hash in the picture anyway), but it would reduce a bunch of > duplicated work we do when it happens as well as scarce slot > consumption. If we did that, then all of a sudden es_query_dsa would > make sense again (ie it'd be whole-query scoped), and we could revert > that hacky change. > > Or we could do both things anyway. This is an open item for v11: Tidy up es_query_dsa and possibly ParallelWorkerContext? Original commit: e13029a5ce353574516c64fd1ec9c50201e705fd (principal author: Thomas Munro; owner: Robert Haas) Bug fix: fd7c0fa732d97a4b4ebb58730e6244ea30d0a618 While the bug was fixed with something back-patchable, we should considering improving this situation. As discussedin the above-linked thread, options might include (1) getting rid of es_query_dsa entirely and injecting dependenciesinto nodes, (2) making all Gather nodes share the same DSM segment so there truly could be per-query DSA segment. Do we want to make any changes here for v11? If not, are we ok with just closing the entry and waiting till it bugs anybody for some reason? Greetings, Andres Freund
On Thu, Apr 12, 2018 at 4:04 AM, Andres Freund <andres@anarazel.de> wrote: > This is an open item for v11: > > Tidy up es_query_dsa and possibly ParallelWorkerContext? > Original commit: e13029a5ce353574516c64fd1ec9c50201e705fd (principal author: Thomas Munro; owner: Robert Haas) > Bug fix: fd7c0fa732d97a4b4ebb58730e6244ea30d0a618 > While the bug was fixed with something back-patchable, we should considering improving this situation. As discussedin the above-linked thread, options might include (1) getting rid of es_query_dsa entirely and injecting dependenciesinto nodes, (2) making all Gather nodes share the same DSM segment so there truly could be per-query DSA segment. > > Do we want to make any changes here for v11? If not, are we ok with just > closing the entry and waiting till it bugs anybody for some reason? I think we should probably do both of the things listed above, but given where we are and given that it's refactoring work and not a stop-ship issue, I propose to do that in the v12 cycle. I'll go remove that from the open items if no one objects soon. -- Thomas Munro http://www.enterprisedb.com