Re: upper planner path-ification - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: upper planner path-ification |
Date | |
Msg-id | 30896.1431879106@sss.pgh.pa.us Whole thread Raw |
In response to | Re: upper planner path-ification (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: upper planner path-ification
Re: upper planner path-ification |
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > So, getting back to this part, what's the value of returning a list of > Paths rather than a list of Plans? (1) less work, since we don't have to fill in details not needed for costing purposes; (2) paths carry info that the planner wants but the executor doesn't, notably sort-order annotations. > target lists are normally computed when paths are converted to plans, > but for the higher-level plan nodes adding by grouping_planner, the > path list is typically just passed in. So would the new path types be > expected to carry target lists of their own, or would they need to > figure out the target list on the fly at plan generation time? Yeah, that is something I've been struggling with while thinking about this. I don't really want to add tlists as such to Paths, but it's not very clear how else to annotate a Path as to what it computes, and that seems like an annotation we have to have in some form in order to convert these planning steps into a Path universe. There are other cases where it would be useful to have some notion of this kind. An example is that right now, if you have an expression index on an expensive function and a query that wants the value of that function, the planner is able to extract the value from the index --- but there is nothing that gives any cost benefit to doing so, so it's just as likely to choose some other index and eat the cost of recalculating the function. It seems like the only way to fix that in a principled fashion is to have some concept that the index-scan Path can produce the function value, and then when we come to some appropriate costing step, penalize the other paths for having to compute the value that's available for free from this one. Rather than adding tlists per se to Paths, I've been vaguely toying with a notion of identifying all the "interesting" subexpressions in a query (expensive functions, aggregates, etc), giving them indexes 1..n, and then marking Paths with bitmapsets showing which interesting subexpressions they can produce values for. This would make things like "does this Path compute all the needed aggregates" much cheaper to deal with than a raw tlist representation would do. But maybe that's still not the best way. Another point is that a Path that computes aggregates is fundamentally different from a Path that doesn't, because it doesn't even produce the same number of rows. So I'm not at all sure how to visualize the idea of a Path that computes only some aggregates, or whether it's even a sensible thing to worry about supporting. > One thing that seems like it might complicate things here is that a > lot of planner functions take PlannerInfo *root as an argument. But > if we generate only paths in grouping_planner() and path-ify them > later, the subquery's root will not be available when we're trying to > do the Path -> Plan transformation. Ah, you're wrong there, because we hang onto the subquery's root already (I forget why exactly, but see PlannerGlobal.subroots for SubPlans, and RelOptInfo.subroot for subquery-in-FROM). So it would not be a fundamental problem to postpone create_plan() for a subquery. > I think grouping_planner() is badly in need of some refactoring just > to make it shorter. It's over 1000 lines of code, which IMHO is a > fairly ridiculous length for a single function. Amen to that. But as I said to Andrew, I think this will be a side-effect of path-ification in this area, and is probably not something to set out to do first. regards, tom lane
pgsql-hackers by date: