Re: Planner creating ineffective plans on LEFT OUTER joins - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Planner creating ineffective plans on LEFT OUTER joins |
Date | |
Msg-id | 603c8f070806260936s905b2cfp48ec1e620a084937@mail.gmail.com Whole thread Raw |
In response to | Re: Planner creating ineffective plans on LEFT OUTER joins (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Planner creating ineffective plans on LEFT OUTER
joins
|
List | pgsql-hackers |
> IMHO we should have a single parameter which indicates how much planning > time we consider acceptable for this query. e.g. > > optimization_level = 2 (default), varies 1-3 > > Most automatic optimisation systems allow this kind of setting, whether > it be a DBMS, or compilers (e.g. gcc). It's my understanding that the philosophy of the PGDG in the past has been to avoid putting any kind of hints into the system, focusing rather an improving the planning of queries. A quick Google search turns up, for example: http://archives.postgresql.org/pgsql-performance/2003-12/msg00181.php Now, perhaps the thinking on this has changed, but a global knob like this strikes me as a bad idea. If Tom is right that improving the plan on queries like this would result in an exponential increase in planning time, then it's certainly important not to paint with too broad a brush. It would really be best to be able to tell the planner which specific part of the query may be susceptible to this type of optimization, because you could easily have many places in a complicated query that would need to be analyzed, and if the planning time is going to be a problem then we don't want to overplan the entire query just to fix the problem in one particular spot. And we certainly don't want to do a whole bunch of other, unrelated, expensive optimizations at the same time. If one were to add a hint, I think the hint should tell the planner: Hey, see this left join? Well, computing the right-hand side of this thing is going to take forever unless we get some information to help us out. So please do all of your limit and filter operations on the left-hand side first, and then if you have any rows left, then evaluate the right-hand side for just the values that matter. i.e. in the example query: SELECT * FROM ab LEFT OUTER JOIN (bc JOIN cd ON bc.c = cd.d) ON ab.b = bc.b WHERE ab.a = 20000 ...please look up the rows in ab where ab.a = 20000. If you find any, then make a hash table of all the values you find for b among those rows. Then when you evaluate (bc JOIN cd ON bc.c = cd.d) you can filter bc for rows where bc.b is in the hash table. This might not be a good query plan in the average case, but there are definitely instances where you might want to force this behavior. In fact, even if you had to do it as a nested loop (re-evaluating the bc JOIN cd clause for each possible value of b) there are still cases where it would be a big win. Of course the nicest thing would be for the planner to realize on its own that the right-hand side of the join is going to generate a gazillion rows and the left-hand side is going to generate one, but maybe (as Tom and the OP suggested) that is expecting too much (though I confess I don't quite see why). ...Robert
pgsql-hackers by date: