Thread: Cannot get to use index scan on a big table!
<tt>Hi Folks,</tt><tt></tt><p><tt>i guess by now it is the most freq. asked question on list ;-)</tt><tt></tt><p><tt>i havea simple SQL query and it does not seems to use index</tt><br /><tt>despite its existance and VCUUMING of table.</tt><tt></tt><p><tt><fontcolor="#000099">tradein_clients=> explain select email_id from email_source where source_id=186;</font></tt><br /><tt><font color="#000099">NOTICE: QUERY PLAN:</font></tt><tt><font color="#000099"></font></tt><p><tt><fontcolor="#000099">Seq Scan on email_source (cost=0.00..19191.50 rows=</font><b><fontcolor="#FF0000">41602</font></b><font color="#000099"> width=4)</font></tt><tt><font color="#000099"></font></tt><p><tt><fontcolor="#000099">EXPLAIN</font></tt><br /><tt><font color="#000099">tradein_clients=></font></tt><tt></tt><p><tt>(cananyone please explain why the figure 41602??)</tt><br/><tt></tt> <tt></tt><p><tt>i am using postgresql 7.1.2 (ples. do not curse me for not upg.)</tt><br /><tt>ihave done "VACUUM ANALYZE" on the table in question</tt><tt></tt><p><tt>the table has ~ 1.1 million records and seqscan is</tt><br /><tt>killing my apps</tt><tt></tt><p><tt>regds</tt><br /><tt>mallah.</tt><br /><tt></tt> <br /><tt></tt> <tt></tt><p><b><tt>Somemore info below:</tt></b><tt></tt><p><tt>tradein_clients=> VACUUM VERBOSE ANALYZE email_source;</tt><br /><tt>NOTICE: --Relation email_source--</tt><br /><tt>NOTICE: Pages 5794: Changed 1, reaped 30, Empty0, New 0; Tup 1071800: Vac 81, Keep/VTL 9/0, Crash 0, UnUsed 9, MinLen 40, MaxLen 40; Re-using: Free/Avail. Space 4560/2208;EndEmpty/Avail. Pages 0/29. CPU 0.30s/0.12u sec.</tt><br /><tt>NOTICE: Index email_source_email_id: Pages 2350;Tuples 1071800: Deleted 0. CPU 0.16s/1.08u sec.</tt><br /><tt>NOTICE: Index email_source_source_id: Pages 2350; Tuples1071800: Deleted 0. CPU 0.14s/1.02u sec.</tt><br /><tt>NOTICE: Rel email_source: Pages: 5794 --> 5794; Tuple(s)moved: 32. CPU 0.01s/0.01u sec.</tt><br /><tt>NOTICE: Index email_source_email_id: Pages 2350; Tuples 1071800: Deleted32. CPU 0.15s/0.84u sec.</tt><br /><tt>NOTICE: Index email_source_source_id: Pages 2350; Tuples 1071800: Deleted32. CPU 0.11s/0.79u sec.</tt><br /><tt>NOTICE: Analyzing...</tt><br /><tt>VACUUM</tt><br /><tt></tt> <tt></tt><p><b><tt>tablestructures:</tt></b><tt></tt><p><tt>tradein_clients=> \d email_source</tt><br /><tt> Table "email_source"</tt><br /><tt> Column | Type | Modifiers</tt><br /><tt>-----------+---------+-----------</tt><br/><tt> email_id | integer |</tt><br /><tt> source_id | integer |</tt><br/><tt>Indexes: email_source_email_id,</tt><br /><tt> email_source_source_id</tt>
On Tue, 23 Apr 2002, Rajesh Kumar Mallah wrote: > i have a simple SQL query and it does not seems to use index > despite its existance and VCUUMING of table. > > tradein_clients=> explain select email_id from email_source where > source_id=186 ; > NOTICE: QUERY PLAN: > > Seq Scan on email_source (cost=0.00..19191.50 rows=41602 width=4) > > EXPLAIN > tradein_clients=> > > (can anyone please explain why the figure 41602??) That's the estimated number of matching rows. I'd guess that this number is an over estimate (how many rows actually are returned?) Do you have any particularly frequent values of source_id that are much more common than others? 7.1 and earlier had problems with over estimating the number of matching rows when the distribution had a very uneven distribution of values, select * from pg_statistic where starelid=(select oid from pg_class where relname='email_source') should give the stored statistics from the analyze. As a comparison, if you do "set enable_seqscan=off;" and then do the query and explain, what does it give for the costs there, and does it take less time?
Stephan Szabo wrote: <p>The actual now nos rows is 10 for source_id=186 <blockquote type="CITE"> <br />Do you have anyparticularly frequent values of source_id that are much <br />more common than others? 7.1</blockquote> yes you are right freq. of source_id are *quite* varied. from 700,000 <br />to 10 :-( but except for the most frequent avg. freq.may be 5000 <br /> <blockquote type="CITE">and earlier had problems with over <br />estimating the number of matchingrows when the distribution had a <br />very uneven distribution of values, select * from pg_statistic where <br />starelid=(selectoid from pg_class where relname='email_source') <br />should give the stored statistics from the analyze.</blockquote><tt><fontcolor="#000099"><font size="-1">select * from pg_statistic where starelid=(select oid frompg_class where relname='email_source');</font></font></tt><br /><tt><font color="#000099"><font size="-1"> starelid |staattnum | staop | stanullfrac | stacommonfrac | stacommonval | staloval | stahival</font></font></tt><br /><tt><font color="#000099"><font size="-1">----------+-----------+-------+-------------+---------------+--------------+----------+----------</font></font></tt><br /><tt><fontcolor="#000099"><font size="-1"> 31548 | 1 | 97 | 0 | 5.59811e-06 | 53872 | 2 | 1626629</font></font></tt><br /><tt><font color="#000099"><font size="-1"> 31548 | 2 | 97 | 0 | 0.611849 | 156 | 1 | 186</font></font></tt><br /><tt><font color="#000099"><font size="-1">(2rows)</font></font></tt><p><b><tt><font size="-2">AFTER VACUUM ANALYINZING JUST NOW!</font></tt></b><br /><b><tt><fontsize="-2"></font></tt></b> <p><tt><font color="#000099"><font size="-1"> starelid | staattnum | staop | stanullfrac| stacommonfrac | stacommonval | staloval | stahival</font></font></tt><br /><tt><font color="#000099"><font size="-1">----------+-----------+-------+-------------+---------------+--------------+----------+----------</font></font></tt><br /><tt><fontcolor="#000099"><font size="-1"> 31548 | 1 | 97 | 0 | 6.4078e-06 | 53872 | 2 | 1629500</font></font></tt><br /><tt><font color="#000099"><font size="-1"> 31548 | 2 | 97 | 0 | 0.600296 | 156 | 1 | 190</font></font></tt><br /><tt><font color="#000099"><font size="-1">(2rows)</font></font></tt><br /> <blockquote type="CITE"> <p>As a comparison, if you do "set enable_seqscan=off;"and then do <br />the query and explain, what does it give for the costs there, and <br />does it takeless time?</blockquote> Still its overestimated , but its much faster <p><tt><font color="#000099"><font size="-1"> explain select count(email_id) from email_source where source_id=186;</font></font></tt><p><tt><font color="#000099"><fontsize="-1">NOTICE: QUERY PLAN:</font></font></tt><tt><font color="#000099"><font size="-1"></font></font></tt><p><tt><fontcolor="#000099"><font size="-1">Aggregate (cost=46798.14..46798.14 rows=1 width=4)</font></font></tt><br/><tt><font color="#000099"><font size="-1"> -> Index Scan using email_source_source_idon email_source (cost=0.00..46688.98 rows=43664 width=4)</font></font></tt><tt><font color="#000099"><fontsize="-1"></font></font></tt><p><tt><font color="#000099"><font size="-1">EXPLAIN</font></font></tt><br/><tt><font color="#000099"><font size="-1"></font></font></tt> <tt><font color="#000099"><fontsize="-1"></font></font></tt><p><font color="#000000">Stephan thanks for the reply,</font><br /><fontcolor="#000000">i think my question has been adequetely answered and i conclude</font><br /><font color="#000000">thati shud. upgrade my PG setup without wasting my/others' time.</font><br /><font color="#000000"></font> <fontcolor="#000000"></font><p><font color="#000000">regds</font><br /><font color="#000000">mallah.</font>