Thread: Cannot get to use index scan on a big table!

Cannot get to use index scan on a big table!

From

Rajesh Kumar Mallah

Date:

23 April 2002, 03:45:34

<tt>Hi Folks,</tt><tt></tt><p><tt>i guess by now it is the most freq. asked question on list ;-)</tt><tt></tt><p><tt>i
havea simple SQL query and it does not seems to use index</tt><br /><tt>despite its existance and VCUUMING of
table.</tt><tt></tt><p><tt><fontcolor="#000099">tradein_clients=> explain  select email_id from email_source  where
source_id=186;</font></tt><br /><tt><font color="#000099">NOTICE:  QUERY PLAN:</font></tt><tt><font
color="#000099"></font></tt><p><tt><fontcolor="#000099">Seq Scan on email_source  (cost=0.00..19191.50
rows=</font><b><fontcolor="#FF0000">41602</font></b><font color="#000099"> width=4)</font></tt><tt><font
color="#000099"></font></tt><p><tt><fontcolor="#000099">EXPLAIN</font></tt><br /><tt><font
color="#000099">tradein_clients=></font></tt><tt></tt><p><tt>(cananyone please explain why the figure
41602??)</tt><br/><tt></tt> <tt></tt><p><tt>i am using  postgresql 7.1.2  (ples. do not curse me for not upg.)</tt><br
/><tt>ihave done "VACUUM  ANALYZE" on the table in question</tt><tt></tt><p><tt>the table has ~ 1.1 million records and
seqscan is</tt><br /><tt>killing my apps</tt><tt></tt><p><tt>regds</tt><br /><tt>mallah.</tt><br /><tt></tt>  <br
/><tt></tt> <tt></tt><p><b><tt>Somemore info below:</tt></b><tt></tt><p><tt>tradein_clients=> VACUUM VERBOSE ANALYZE
email_source;</tt><br /><tt>NOTICE:  --Relation email_source--</tt><br /><tt>NOTICE:  Pages 5794: Changed 1, reaped 30,
Empty0, New 0; Tup 1071800: Vac 81, Keep/VTL 9/0, Crash 0, UnUsed 9, MinLen 40, MaxLen 40; Re-using: Free/Avail. Space
4560/2208;EndEmpty/Avail. Pages 0/29. CPU 0.30s/0.12u sec.</tt><br /><tt>NOTICE:  Index email_source_email_id: Pages
2350;Tuples 1071800: Deleted 0. CPU 0.16s/1.08u sec.</tt><br /><tt>NOTICE:  Index email_source_source_id: Pages 2350;
Tuples1071800: Deleted 0. CPU 0.14s/1.02u sec.</tt><br /><tt>NOTICE:  Rel email_source: Pages: 5794 --> 5794;
Tuple(s)moved: 32. CPU 0.01s/0.01u sec.</tt><br /><tt>NOTICE:  Index email_source_email_id: Pages 2350; Tuples 1071800:
Deleted32. CPU 0.15s/0.84u sec.</tt><br /><tt>NOTICE:  Index email_source_source_id: Pages 2350; Tuples 1071800:
Deleted32. CPU 0.11s/0.79u sec.</tt><br /><tt>NOTICE:  Analyzing...</tt><br /><tt>VACUUM</tt><br
/><tt></tt> <tt></tt><p><b><tt>tablestructures:</tt></b><tt></tt><p><tt>tradein_clients=> \d email_source</tt><br
/><tt>     Table "email_source"</tt><br /><tt>  Column   |  Type   | Modifiers</tt><br
/><tt>-----------+---------+-----------</tt><br/><tt> email_id  | integer |</tt><br /><tt> source_id | integer
|</tt><br/><tt>Indexes: email_source_email_id,</tt><br /><tt>         email_source_source_id</tt>

Re: Cannot get to use index scan on a big table!

From

Stephan Szabo

Date:

23 April 2002, 11:00:59

On Tue, 23 Apr 2002, Rajesh Kumar Mallah wrote:

> i have a simple SQL query and it does not seems to use index
> despite its existance and VCUUMING of table.
>
> tradein_clients=> explain  select email_id from email_source  where
> source_id=186 ;
> NOTICE:  QUERY PLAN:
>
> Seq Scan on email_source  (cost=0.00..19191.50 rows=41602 width=4)
>
> EXPLAIN
> tradein_clients=>
>
> (can anyone please explain why the figure 41602??)

That's the estimated number of matching rows.  I'd guess that this
number is an over estimate (how many rows actually are returned?)
Do you have any particularly frequent values of source_id that are much
more common than others?  7.1 and earlier had problems with over
estimating the number of matching rows when the distribution had a
very uneven distribution of values, select * from pg_statistic where
starelid=(select oid from pg_class where relname='email_source')
should give the stored statistics from the analyze.

As a comparison, if you do "set enable_seqscan=off;" and then do
the query and explain, what does it give for the costs there, and
does it take less time?

Re: Cannot get to use index scan on a big table!

From

Rajesh Kumar Mallah

Date:

24 April 2002, 01:54:05

Stephan Szabo wrote: <p>The actual now nos  rows is 10 for source_id=186 <blockquote type="CITE">  <br />Do you have
anyparticularly frequent values of source_id that are much <br />more common than others?  7.1</blockquote> yes you are
right freq. of source_id are *quite* varied. from  700,000 <br />to 10  :-(  but except  for the most frequent  avg.
freq.may be 5000 <br />  <blockquote type="CITE">and earlier had problems with over <br />estimating the number of
matchingrows when the distribution had a <br />very uneven distribution of values, select * from pg_statistic where <br
/>starelid=(selectoid from pg_class where relname='email_source') <br />should give the stored statistics from the
analyze.</blockquote><tt><fontcolor="#000099"><font size="-1">select * from pg_statistic  where starelid=(select oid
frompg_class where relname='email_source');</font></font></tt><br /><tt><font color="#000099"><font size="-1"> starelid
|staattnum | staop | stanullfrac | stacommonfrac | stacommonval | staloval | stahival</font></font></tt><br /><tt><font
color="#000099"><font
size="-1">----------+-----------+-------+-------------+---------------+--------------+----------+----------</font></font></tt><br
/><tt><fontcolor="#000099"><font size="-1">    31548 |         1 |    97 |           0 |   5.59811e-06 | 53872        |
2       | 1626629</font></font></tt><br /><tt><font color="#000099"><font size="-1">    31548 |         2 |    97
|          0 |      0.611849 | 156          | 1        | 186</font></font></tt><br /><tt><font color="#000099"><font
size="-1">(2rows)</font></font></tt><p><b><tt><font size="-2">AFTER VACUUM ANALYINZING JUST NOW!</font></tt></b><br
/><b><tt><fontsize="-2"></font></tt></b>  <p><tt><font color="#000099"><font size="-1"> starelid | staattnum | staop |
stanullfrac| stacommonfrac | stacommonval | staloval | stahival</font></font></tt><br /><tt><font color="#000099"><font
size="-1">----------+-----------+-------+-------------+---------------+--------------+----------+----------</font></font></tt><br
/><tt><fontcolor="#000099"><font size="-1">    31548 |         1 |    97 |           0 |    6.4078e-06 | 53872        |
2       | 1629500</font></font></tt><br /><tt><font color="#000099"><font size="-1">    31548 |         2 |    97
|          0 |      0.600296 | 156          | 1        | 190</font></font></tt><br /><tt><font color="#000099"><font
size="-1">(2rows)</font></font></tt><br />  <blockquote type="CITE">  <p>As a comparison, if you do "set
enable_seqscan=off;"and then do <br />the query and explain, what does it give for the costs there, and <br />does it
takeless time?</blockquote> Still its overestimated  , but its  much faster <p><tt><font color="#000099"><font
size="-1"> explain select count(email_id) from email_source  where source_id=186;</font></font></tt><p><tt><font
color="#000099"><fontsize="-1">NOTICE:  QUERY PLAN:</font></font></tt><tt><font color="#000099"><font
size="-1"></font></font></tt><p><tt><fontcolor="#000099"><font size="-1">Aggregate  (cost=46798.14..46798.14 rows=1
width=4)</font></font></tt><br/><tt><font color="#000099"><font size="-1">  ->  Index Scan using
email_source_source_idon email_source  (cost=0.00..46688.98 rows=43664 width=4)</font></font></tt><tt><font
color="#000099"><fontsize="-1"></font></font></tt><p><tt><font color="#000099"><font
size="-1">EXPLAIN</font></font></tt><br/><tt><font color="#000099"><font size="-1"></font></font></tt> <tt><font
color="#000099"><fontsize="-1"></font></font></tt><p><font color="#000000">Stephan thanks for the reply,</font><br
/><fontcolor="#000000">i think my question has been adequetely answered and i conclude</font><br /><font
color="#000000">thati shud. upgrade my PG setup without wasting my/others' time.</font><br /><font
color="#000000"></font> <fontcolor="#000000"></font><p><font color="#000000">regds</font><br /><font
color="#000000">mallah.</font>