Re: Is this planner choice easily explained? - Mailing list pgsql-bugs
From | Philip Warner |
---|---|
Subject | Re: Is this planner choice easily explained? |
Date | |
Msg-id | 5.1.0.14.0.20021122110611.06a87438@mail.rhyme.com.au Whole thread Raw |
In response to | Re: Is this planner choice easily explained? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Is this planner choice easily explained?
|
List | pgsql-bugs |
> >Could we see the pg_stats rows for the columns used in these queries? >(You have done a vacuum analyze recently, I trust...) Strange you should ask; it is actually taking some effort to persuade them it's a good idea. They also don't believe that a LIMIT clause affects strategy choice, so it's an uphill battle. As a result, my strong suspicion is that there will be no results in pg_stat - but I have asked for the data. I am mainly interested in understanding the output from this point of view -- and to understand the choice that has been made in such a 'boundary case' (when no analyze has been done). Am I correct in my interpretation that: explain SELECT messageblk FROM messageblks WHERE message_idnr = 100::bigint ORDER BY messageblk_idnr ; Sort (cost=5793.33..5793.33 rows=1453 width=40) -> Index Scan using messageblks_msg_idx on messageblks (cost=0.00..5716.99 rows=1453 width=40) means it expects to get 1453 rows based on a search for a specific key (hence why it has a high cost)? Based on other tests I have done, I have concluded that it assumes a selectivity of 0.5% for non-unique indexes - is that right? Whereas with: explain SELECT messageblk FROM messageblks WHERE message_idnr = 100::bigint ORDER BY messageblk_idnr limit 1; Limit (cost=0.00..777.50 rows=1 width=40) -> Index Scan using messageblks_id_idx on messageblks (cost=0.00..1129984.15 rows=1453 width=40) it looks like the high cost on the last line here is based on the number of pages/tuples in the file, and that the limit is causing 1/1453th of the cost to be applied. It looks like it gets the 1453 as a basic default selectivity again. The problem I have with this is that it seems that the estimate for the LIMIT is assuming that the first row returned will be the right one; in fact it has no guarantee that the specified criteria will be satisfied by the row. Is this assumption likely to be wrong when we do have stats? ISTM that a LIMIT with a WHERE clause that is not satisfied by any index used in the search should not be deemed to reduce the result set size as drastically. In fact, it may actually have to scan the entire result set to get the row. Err...I think 8-). ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
pgsql-bugs by date: