Re: upper planner path-ification - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: upper planner path-ification |
Date | |
Msg-id | 4187.1431570466@sss.pgh.pa.us Whole thread Raw |
In response to | upper planner path-ification (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: upper planner path-ification
Re: upper planner path-ification |
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > I've been pulling over Tom's occasional remarks about redoing > grouping_planner - and maybe further layers of the planner - to work > with Paths instead of Plans. ... > I think there are two separate problems here. First, there's the > problem that grouping_planner() is complicated. > Second, there's the problem that we might like to order aggregates > with respect to joins. Both of those are problems all right, but there is more context here. * As some of the messages you cited mention, we would like to have Path representations for things like aggregation, because that's the only way we'll get to a sane API that lets FDWs propose remote aggregation. * We have also had requests for the planner to be smarter about UNION/INTERSECT/EXCEPT queries. Again, that requires cost comparisons, which would be better done if we had Path representations for the various ways we'd want to consider. Also, a big part of the issue there is wanting to be able to consider sorted versus unsorted plans for the leaf queries of the set-op (IOW, optionally pushing the sort requirements of the set-op down into the leaves). Right now, such comparisons are impossible because prepunion.c uses subquery_planner to handle the leaf queries, and what it gets back from that is one finished plan, not alternative Paths. * Likewise, subqueries-in-FROM are handled by recursing to subquery_planner, which gives us back just one frozen Plan for the subquery. Among other things this seems to make it too expensive to consider generating parameterized paths for the subquery. I'd like to keep subquery plans in Path form until much later as well. So these considerations motivate wishing that the result of subquery_planner could be a list of alternative Paths rather than a Plan, which means that every operation it knows how to tack onto the scan/join plan has to be representable by a Path of some sort. I don't know how granular that needs to be, though. For instance, one could certainly imagine that it might be sufficient initially to have a single "WindowPath" that represents "do all the window functions", and then at create_plan time we'd generate multiple WindowAgg plan nodes in the same ad-hoc way as now. Breaking that down in the Path representation would only become interesting if it would affect higher-level decisions, and I'm not immediately seeing how it might do that. > I'm inclined to think that it would be useful to solve the first > problem even if we didn't solve the second one right away (but that > might be wrong). As a preparatory step, I'm thinking it would be > sensible to split grouping_planner() into an outer function that would > handle the addition of Limit and LockRows nodes and maybe planning of > set operations, and an inner function that would handle GROUP BY, > DISTINCT, and possibly window function planning. For the reasons I mentioned, I'd like to get to a point where subquery_planner's output is Paths not Plans as soon as possible. But the idea of coarse representation of steps that we aren't trying to be smart about might be useful to save some labor in the short run. The zero-order version of that might be a single Path node type that represents "do whatever grouping_planner would do", which we'd start to break down into multiple node types once we had the other APIs fixed. regards, tom lane
pgsql-hackers by date: