Home > mailing lists

index vs seqscan question - Mailing list pgsql-general

From	Frank Bax
Subject	index vs seqscan question
Date	November 22, 2002 16:19:34
Msg-id	5.1.1.6.0.20021122154406.03aa4ac0@pop6.sympatico.ca Whole thread Raw
Responses	Re: index vs seqscan question
List	pgsql-general

Tree view

I was looking and one of my SQL statements today and decided to try a
slight variation to my query - and what a difference it made!!  The query
went from over 10 minutes down to under 30 seconds.  I curious though; if I
read the explain output correctly...
->  Index Scan using timesheet_index_emp on timesheet  (cost=0.00..19056.43
rows=122207 width=40)
->  Seq Scan on timesheet  (cost=0.00..7244.02 rows=122207 width=28)

These statements imply the planner knows a seqscan is quicker than an index
scan (only 3 times faster by its estimate, actually much more), why does it
still use an index scan?

Additional information:
- pgsql version 7.1
- vacuum analyse is run nightly.
- the "earncode in..." clause by itself will select 85% of rows.
- earncode = ' ' selects 80% of rows in table.
- typ ' ' or 'A' selects 99% of rows in table.
- final result is just approx 100 rows.

Frank


The original (slow) version:
gwacl=> explain SELECT * FROM (SELECT emp, SUM (CASE WHEN earncode IN ('
','A','O','P','Q','X','Z') THEN
date_part('epoch',age(hi_shift,lo_shift))/(60*60) ELSE 0 END) AS hours FROM
timesheet WHERE typ=' ' OR typ='A' GROUP BY emp) AS ts INNER JOIN (SELECT
emp, first, last FROM employee WHERE status='A') AS emp ON (ts.emp =
emp.emp) WHERE hours BETWEEN 0 AND 1250 ORDER BY hours DESC;
NOTICE:  QUERY PLAN:

Sort  (cost=21441.77..21441.77 rows=592 width=56)
   ->  Merge Join  (cost=21261.28..21414.52 rows=592 width=56)
         ->  Sort  (cost=21300.99..21300.99 rows=12221 width=40)
               ->  Subquery Scan ts  (cost=0.00..20278.50 rows=12221 width=40)
                     ->  Aggregate  (cost=0.00..20278.50 rows=12221 width=40)
                           ->  Group  (cost=0.00..19361.94 rows=122207
width=40)
                                 ->  Index Scan using timesheet_index_emp
on timesheet  (cost=0.00..19056.43 rows=122207 width=40)
         ->  Sort  (cost=20.89..20.89 rows=38 width=36)
               ->  Seq Scan on employee  (cost=0.00..19.89 rows=38 width=36)
EXPLAIN

The revised (much improved) version:
gwacl=> explain SELECT * FROM (SELECT emp,
SUM(date_part('epoch',age(hi_shift,lo_shift))/(60*60)) as hours FROM
timesheet WHERE typ=' ' OR typ='A' AND earncode IN ('
','A','O','P','Q','X','Z') GROUP BY emp) AS ts INNER JOIN (SELECT emp,
first, last FROM employee WHERE status='A') AS emp ON (ts.emp = emp.emp)
WHERE hours BETWEEN 0 AND 1250 ORDER BY hours DESC;
NOTICE:  QUERY PLAN:

Sort  (cost=23993.79..23993.79 rows=592 width=56)
   ->  Merge Join  (cost=23813.31..23966.55 rows=592 width=56)
         ->  Sort  (cost=23816.60..23816.60 rows=12221 width=28)
               ->  Subquery Scan ts  (cost=21608.46..22830.53 rows=12221
width=28)
                     ->  Aggregate  (cost=21608.46..22830.53 rows=12221
width=28)
                           ->  Group  (cost=21608.46..21913.97 rows=122207
width=28)
                                 ->  Sort  (cost=21608.46..21608.46
rows=122207 width=28)
                                      ->  Seq Scan on
timesheet  (cost=0.00..7244.02 rows=122207 width=28)
         ->  Sort  (cost=20.89..20.89 rows=38 width=36)
               ->  Seq Scan on employee  (cost=0.00..19.89 rows=38 width=36)
EXPLAIN

pgsql-general by date:

From: Tom Jenkins
Date: 22 November 2002, 16:01:09
Subject: Seeking PostgreSQL DBA/programmer

From: Eric B.Ridge
Date: 22 November 2002, 16:30:23
Subject: UNION and array types

index vs seqscan question - Mailing list pgsql-general

Previous

Next