Re: BUG #5059: Planner ignores estimates when planning an IN () subquery - Mailing list pgsql-bugs
From | Robert Haas |
---|---|
Subject | Re: BUG #5059: Planner ignores estimates when planning an IN () subquery |
Date | |
Msg-id | 603c8f070909161935t68266cdt2cf1c10803b8ccd7@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #5059: Planner ignores estimates when planning an IN () subquery (Kenaniah Cerny <kenaniah@gmail.com>) |
Responses |
Re: BUG #5059: Planner ignores estimates when planning an IN
() subquery
|
List | pgsql-bugs |
On Wed, Sep 16, 2009 at 6:39 PM, Kenaniah Cerny <kenaniah@gmail.com> wrote: > I can provide the output of statistics queries if you would like. Just let > me know which statistics you want and I'll pastebin them. > > As far as selectivity goes, the selectivity estimate for the > user_anime_log.user_account_id was definitely miscalculated. The > user_anime_log contains up to 15 entries per user (70,000 users, but only > 475,811 rows). The default statistics target on that relation is set to > 1000. But even with poor statistics, guessing 62,000 rows when there's a > maximum of 15 per user account is still quite a miss. You've changed the table name on me, vs. what you pasted in the query, which had a user_activity_log but no user_anime_log... > Analysis of SELECT * FROM user_activity_log WHERE user_account_id = 17238; > estimates 13 rows and returns 15, which is quite reasonable considering the > statistics targets. > > Please forgive my ignorance, but in the case of my subquery, is the > estimated number of rows and cost being taken into account (or only the > selectivity)? Well, selectivity is just a term that refers to the fraction of rows that match some condition (rows themselves do not have selectivity). Usually the initial estimating is done in terms of selectivity, which is then multiplied by the total number of rows to find the number of rows that will remain after the condition is applied. So, yes, rows and cost are taken into account. The problem here is that the planner is mis-estimating the selectivity, therefore it computes the wrong number of rows (way too high), therefore it makes the wrong decision. > Granted I don't understand much about the planner internals, > but it seems strange that a nested loop semi join would be chosen when the > inner table is estimated to return an extremely low number of rows with low > cost and low selectivity. Shouldn't the planner also estimate the cost of an > inner (er, left?) join in that scenario? Well... you can't replace a semi join with an inner or left join, because it doesn't do the same thing. You could use a hash semi join or merge join semi join, but that doesn't make sense if, as you say, the inner table is estimated to return an extremely low number of rows. It might be a bit easier to analyze this if you stripped out all the joins that aren't necessary to reproduce the problem. Also, I would prefer EXPLAIN ANALYZE output posted in-line to the mailing list rather than pasted to a separate web site - it screws up the formatting. But honestly I'm not sure how much time it's worth spending on this. You have a way to rewrite the query that works... and fixing the estimator is going to be hard... so I suggest doing it the way that works, and moving on! ...Robert
pgsql-bugs by date: