Re: [HACKERS] Proposal : Parallel Merge Join - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Proposal : Parallel Merge Join |
Date | |
Msg-id | CA+TgmoYOv+dFK0MWW6366dFj_xTnohQfoBDrHyB7d1oZhrgPjA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Proposal : Parallel Merge Join (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [HACKERS] Proposal : Parallel Merge Join
Re: [HACKERS] Proposal : Parallel Merge Join |
List | pgsql-hackers |
On Wed, Mar 1, 2017 at 12:01 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Mar 1, 2017 at 11:24 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote: >> On Wed, Mar 1, 2017 at 11:13 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> I think for now we can keep the parallel safety check for cheapest >>> inner path, though it will be of use only for the very first time we >>> compare the paths in that loop. I am not sure if there is any other >>> better way to handle the same. >> >> Done that way. > > Thanks, your patch looks good to me. I'm not happy with the way this patch can just happen to latch on to a path that's not parallel-safe rather than one that is and then just give up on a merge join in that case. I already made this argument in https://www.postgresql.org/message-id/CA+TgmobdW2au1Jq5L4ySA2ZhqFmA-qNvD7ZFaZbJWm3c0ysWyw@mail.gmail.com and my opinion hasn't changed. I've subsequently come to realize that this problem is more widespread that I initially realized: not only do sort_inner_and_outer() and generate_partial_mergejoin_paths() give up without searching for the cheapest parallel-safe path, but the latter later calls get_cheapest_path_for_pathkeys() and then just pretends it didn't find anything unless the result is parallel-safe. This makes no sense to me: the cheapest parallel-safe path could be 1% more expensive than the cheapest path of any kind, but because it's not the cheapest we arbitrarily skip it, even though parallelizing more of the tree might leave us way ahead overall. I suggest that we add an additional "bool require_parallel_safe" argument to get_cheapest_path_for_pathkeys(); when false, the function will behave as at present, but when true, it will skip paths that are not parallel-safe. And similarly have a function to find the cheapest parallel-safe path if the first one in the list isn't it. See attached draft patch. Then this code can search for the correct path instead of searching using the wrong criteria and then giving up if it doesn't get the result it wants. + if (!(joinrel->consider_parallel && + save_jointype != JOIN_UNIQUE_OUTER && + save_jointype != JOIN_FULL && + save_jointype != JOIN_RIGHT && + outerrel->partial_pathlist != NIL && + bms_is_empty(joinrel->lateral_relids))) I would distribute the negation, so that this reads if (!joinrel->consider_parallel || save_jointype == JOIN_UNIQUE_OUTER || save_jointype == JOIN_FULL || ...). Or maybe better yet, just drop the ! and the return that follows and put the consider_parallel_nestloop and consider_parallel_mergejoin calls inside the if-block. You could Assert(nestjoinOK) instead of testing it, although I guess it doesn't really matter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: