Re: estimates for nested loop very wrong? - Mailing list pgsql-sql
From | joostje@komputilo.org |
---|---|
Subject | Re: estimates for nested loop very wrong? |
Date | |
Msg-id | 20030410162902.GA28537@co.uea.org Whole thread Raw |
In response to | Re: estimates for nested loop very wrong? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: estimates for nested loop very wrong?
|
List | pgsql-sql |
Je 2003/04/10(4)/10:04, Tom Lane skribis: > Have you done an ANALYZE or VACUUM ANALYZE recently? Jes, both, actually, and the `analyse' quite a few times. > > Nested Loop (cost=0.00..208256.60 rows=61140 width=38) (actual time=0.92..18.49 rows=756 loops=1) > The planner is evidently estimating that each row of tmp1 will match 2600+ > rows of db, whereas in reality there is only one match. Rather than > mess with enable_hashjoin, you need to find out why that estimate is so > badly off. Are the entries in tmp1 specially selected to correspond to > unique rows of db? Well, each entry in tmp1 matches with about 7-80 entries in db, but yes the problem indeed seems to be that the estimate is so far off. And no, the entries in tmp1 are not specially selected, they correspond to `normal' values of id in db (values that are about as frequent as other values). I have done VACUUM ANALYSE on the table (and drop index; create index db_id_idx on db(id);). => analyse db; => select n_distinct from pg_stats where tablename='db' and attname='id'; 1996 => select count(distinct(id)) from db; 42225 Unless I'm mistaken, pg_nstats.n_distinct should be (aproximately) the same as count(distinct(id)), but it obviously isn't. Also the most_common_freqs values are about a 100 times higher than in reality, and, even tough about 900 values of id occur more often than 40 times, in the 'most_common_vals' list are 7 (of the 10) vals that occur less than 40 times, and the real top two isn't even represented. (BTW, the table I'm using now is a little smaller, as it turned out that a few (75%) of the entries in db had only 3 different id values. This didn't have any effect on the accurateness of the estimates, though). BTW, => select count(id) from db; 586035 Thanks! joostje -- what pg_stat thinks about db.id: => select n_distinct, most_common_vals, most_common_freqs from pg_stats where tablename='db' and attname='id';n_distinct| most_common_vals | most_common_freqs ------------+-----------------------------------------------------+------------------------------------------------------------------------------------------------------- 1907 | {subo,smys,raha,sjbo,sdai,roal,sooi,stsw,rmwi,snuw} | {0.00733333,0.007,0.00633333,0.00633333,0.006,0.00566667,0.00566667,0.00566667,0.00533333,0.00533333} --these estimates are far off: => select id, count (id)/586035.0 from db where id='subo' or id='smys' or id='raha' or id='sjbo' or id='sdai' or id='roal'or id='sooi' or id='stsw' or id='rmwi' or id='snuw' group by id; id | ?column? ------+----------------------raha | 0.000156987210661479rmwi | 3.24212717670446e-05roal | 6.3136160809508e-05sdai | 8.70255189536461e-05sjbo| 6.3136160809508e-05smys | 7.5080839881577e-05snuw | 4.26595681145324e-05sooi | 0.000114327642546947stsw| 6.14297780849267e-05subo | 5.11914817374389e-05 --and these would be the real most_common_freqs: => select id, count(id), count(id)/586035.0 from db group by id order by - count(id) limit 10; id | count | ?column? --------+-------+----------------------indmem | 194 | 0.000331038248568771hton | 97 | 0.000165519124284386raha | 92 | 0.000156987210661479simo | 87 | 0.000148455297038573sugn | 87 | 0.000148455297038573rjgl | 85 | 0.00014504253158941hroy | 84 | 0.000143336148864829jrgv | 84 | 0.000143336148864829tojo | 83 | 0.000141629766140248lucy | 82 | 0.000139923383415666 -- the above all done after a 'vacuum analyse';