Re: Strange query optimization in 7.3.2 - Mailing list pgsql-general
From | Alec Mitchell |
---|---|
Subject | Re: Strange query optimization in 7.3.2 |
Date | |
Msg-id | 200304141520.30037.apm13@columbia.edu Whole thread Raw |
In response to | Re: Strange query optimization in 7.3.2 (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Strange query optimization in 7.3.2
|
List | pgsql-general |
On Monday 14 April 2003 01:25 pm, Tom Lane wrote: > I think the reason for the change in plan is the same bug discussed at > http://fts.postgresql.org/db/mw/msg.html?mid=1064055 > > However, you will probably not like the fix, since it eliminates the > bogusly small cost estimate for the duplicated index condition, and > thereby ensures that your less-favored plan will always be chosen :-( > > What would be interesting is to look into why the planner's estimated > costs are inaccurate. I think the main cause is the badly-off join > estimate for the tr/t join --- notice it's estimating 1119 rows out > where only 52 are actually produced. The nestloop's runtime is directly > proportional to the number of outer rows, so this leads directly to a > factor-of-20 overestimate of the nestloop's cost, discouraging the > planner from using it. The bug that's triggered by the duplicate > index condition underestimates the cost, thereby negating that error to > some extent. > > You should look into whether increasing the statistics targets for > t.terminal and tr.terminal would improve the accuracy of the join > estimate. That bug does seem to be the cause of the confusing plan change. I really don't understand that bad estimate for the join though (the estimate of 1119 is for the tr/m join, rather than the tr/t join). Here are some details about the tables and joins involved: The tr/t join produces 52 rows with unique trailers (the primary key on tr) out of the 750 available (the planner estimates 62). These are then joined with the manifests table m, which has 13526 rows. The relationship between tr.trailer and m.trailer is a bit complex. Of the 750 possible trailer values in tr, 607 have a one to one mapping to rows in m. The remaining 143 values are each referenced in 1-70 (avg 24) different rows in m. Additionally, there are 9510 rows in m (the vast majority), which have a null value for trailer (perhaps that is the cause of these bad statistics). This particular query happens to call only trailers with a one to one relationship to rows in m (this not unlikely considering that this condition is true for 607 out of the 750 values). Setting the statistics targets for m.trailer to 200 or even 750, both of which should be overkill considering that there are only 750 possible values, and then performing a VACUUM ANALYZE strangely makes no difference to the row estimates. I'm having a lot of trouble tracking down the reason for this bad estimate (all the preceding estimates are quite good). Any further guidance would be greatly appreciated. Thanks, Alec Mitchell
pgsql-general by date: