Re: Using each rel as both outer and inner for JOIN_ANTI - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Using each rel as both outer and inner for JOIN_ANTI |
Date | |
Msg-id | CA+hUKGJsT_Uag8pr_VDRPiJ-XaMH2DpnsFeR=JXWjP90CP3nCQ@mail.gmail.com Whole thread Raw |
In response to | Re: Using each rel as both outer and inner for JOIN_ANTI (Richard Guo <guofenglinux@gmail.com>) |
List | pgsql-hackers |
On Thu, Apr 6, 2023 at 6:40 PM Richard Guo <guofenglinux@gmail.com> wrote: > Seems it wins if the parallel scan becomes part of a hash join in final > plan. I wonder if we have a way to know that in this early stage. I haven't tried but I'm not sure off the top of my head how to make a decision that early unless it's super coarse grained, like is there an anti join in here somewhere... Generally, this seems to be a consequence of our parallel query planner design where parallel paths are generated separately and compete on cost, as foretold by Hong and Stonebraker[1]. It's going to be hard to prune those early without missing out on some good plans, if you don't have a crystal ball. I wondered for a while what horrible technical problems would come up if we could defer creation of paths, so partial_pathlist is empty but a new RelOptInfo flag says "you can call finish_the_partial_pathlist_please(root, rel)) if you really want one". We could try to be sparing about calling it that so we don't finish up creating them all. That would at least move the should-I-bother-to-make-this-path? logic close to the place with the knowledge that it'd be useful, in this case the inner side of a parallel hash join. One problem is that you have to get a parallel_workers number from somewhere to make a partial path. The hash join path code knows what its own parallel_workers number will be (it inherits it from the outer path, though I can't immediately think of any good reason why it shouldn't be Max(inner, outer)), but if we were to feed that into a created-on-demand inner path that is then cached, we'd have a horrible ordering dependency (some other join in the query gets planned first with a different parallel_workers number and it gets cached differently). Yuck. As you noted, 0 isn't a great number either, but it leads to a another thought... I wondered if we could somehow use the complete (non-partial) path directly in some cases here, if certain conditions are met. Aside from any other technical problems, you might ask "but what about the estimates/costing we already did for that path"; well the numbers are usually wrong anyway! We have complete paths, and partial paths for arbitrary parallel_workers numbers that bubbled up from our scan size-based heuristics. Every time we combine more than one partial path, we have to handle non-matching "parallel degree" (I'm using that word to mean the result of get_parallel_divisor(path), a lightly grilled version of parallel_workers that makes dubious assumptions, but I digress). Concretely, parallel append and parallel hash join already rescale some of their input variables to match their own nominal parallel degree (ugh, I see a few things we should fix in that code, but this email is long enough already). I wonder if we might need some infrastructure to formalise that sort of thing. For example, look at the row estimates in the EXPLAIN of parallel append over tables with parallel_workers explicitly set to different numbers using ALTER TABLE. They are wrong (they're based on different parallel degrees; turn off parallel_leader_participation to make the arithmetic easier to follow), while the append node itself has rescaled them and has a good row estimate for its own nominal parallel degree, which in turn might be wrong depending on what is above. Perhaps EXPLAIN and everything else should use some common infrastructure to deal with this. In other words, we avoid the need for a try-every-parallel-degree explosion by rescaling from some arbitrary input degree to some arbitrary output degree, but we can't go all the way and do the two-phase Hong thing and rescale from non-partial paths in general (for various technical reasons that apply to more complex nodes, but not to basic scans). From where we are, I'm not sure how much of a big step it is to (1) formalise the path rescaling system and (2) be able to rescale some qualifying simple complete paths too, if needed for places like this. Of course you could quibble with the concept of linear scaling of various number by parallel degrees; various things aren't linear or even continuous (probably why Hong's system included hash join thresholds). Even the concept of get_parallel_divisor(path) as applied to row estimates is suspect, because it assumes that the planned workers will show up; if a smaller number shows up (at all, due to max_parallel_workers, or just to this node because a higher level parallel append sent workers to different subplans) then a node might receive many more input tuples than expected and blow through work_mem, even if all estimates were 100% perfect. I have no idea what to do about that. At least *that* problem doesn't apply to parallel hash, which shares the memory for the number of planned workers, even if fewer show up, but that ideas isn't without critics either. I dunno. Sorry for the wall of text/ideas. I see unfinished business in every direction. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGL-Fo9mZyFK1tdmzFng2puRBrgROsCiB1%3Dn7wP79mTZ%2Bg%40mail.gmail.com
pgsql-hackers by date: