Home > mailing lists

Query tuning help - Mailing list pgsql-performance

From	Dan Harris
Subject	Query tuning help
Date	May 8, 2005 20:20:44
Msg-id	924ac6c029f5453c442eacdda06b132c@drivefaster.net Whole thread Raw
Responses	Re: Query tuning help Re: Query tuning help Re: Query tuning help
List	pgsql-performance

Tree view

Sorry to bother everyone with yet another "my query isn't using an
index" problem but I am over my head on this one..  I am open to ways
of restructuring this query to perform better.

I have a table, 'ea', with 22 million rows in it.  VACUUM ANALYZE has
been just run on the table.

This is the result of:

explain analyze
select distinct
    em.incidentid,
    ea.recordtext as retdata,
    eg.long,
    eg.lat
from
    ea, em, eg
where
    em.incidentid = ea.incidentid and
    em.incidentid = eg.incidentid and
    em.entrydate >= '2005-1-1 00:00' and
    em.entrydate <= '2005-5-9 00:00'
    and ea.incidentid in (
        select
            incidentid
        from
            ea
        where
            recordtext like '%RED%'
    )

    and ea.incidentid in (
        select
            incidentid
        from
            ea
        where
            recordtext like '%CORVETTE%'
    )
    and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )
order by em.entrydate


---------------------
ANALYZE RESULTS
---------------------

  Unique  (cost=774693.72..774693.76 rows=1 width=159) (actual
time=446787.056..446787.342 rows=72 loops=1)
    ->  Sort  (cost=774693.72..774693.72 rows=1 width=159) (actual
time=446787.053..446787.075 rows=72 loops=1)
          Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat
          ->  Nested Loop  (cost=771835.10..774693.71 rows=1 width=159)
(actual time=444378.655..446786.746 rows=72 loops=1)
                ->  Nested Loop  (cost=771835.10..774688.81 rows=1
width=148) (actual time=444378.532..446768.381 rows=72 loops=1)
                      ->  Nested Loop IN Join
(cost=771835.10..774678.88 rows=2 width=81) (actual
time=444367.080..446191.864 rows=701 loops=1)
                            ->  Nested Loop  (cost=771835.10..774572.05
rows=42 width=64) (actual time=444366.859..445463.232 rows=1011
loops=1)
                                  ->  HashAggregate
(cost=771835.10..771835.10 rows=1 width=17) (actual
time=444366.702..444368.583 rows=473 loops=1)
                                        ->  Seq Scan on ea
(cost=0.00..771834.26 rows=335 width=17) (actual
time=259.746..444358.837 rows=592 loops=1)
                                              Filter:
((recordtext)::text ~~ '%CORVETTE%'::text)
                                  ->  Index Scan using ea1 on ea
(cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2
loops=473)
                                        Index Cond:
((ea.incidentid)::text = ("outer".incidentid)::text)
                                        Filter: (((recordtext)::text ~~
'%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))
                            ->  Index Scan using ea1 on ea
(cost=0.00..2733.81 rows=42 width=17) (actual time=0.703..0.703 rows=1
loops=1011)
                                  Index Cond:
(("outer".incidentid)::text = (ea.incidentid)::text)
                                  Filter: ((recordtext)::text ~~
'%RED%'::text)
                      ->  Index Scan using em_incidentid_idx on em
(cost=0.00..4.95 rows=1 width=67) (actual time=0.820..0.821 rows=0
loops=701)
                            Index Cond: (("outer".incidentid)::text =
(em.incidentid)::text)
                            Filter: ((entrydate >= '2005-01-01
00:00:00'::timestamp without time zone) AND (entrydate <= '2005-05-09
00:00:00'::timestamp without time zone))
                ->  Index Scan using eg_incidentid_idx on eg
(cost=0.00..4.89 rows=1 width=79) (actual time=0.245..0.246 rows=1
loops=72)
                      Index Cond: (("outer".incidentid)::text =
(eg.incidentid)::text)
  Total runtime: 446871.880 ms
(22 rows)


-------------------------
EXPLANATION
-------------------------
The reason for the redundant LIKE clause is that first, I only want
those "incidentid"s that contain the words 'RED' and 'CORVETTE'.  BUT,
those two words may exist across multiple records with the same
incidentid.  Then, I only want to actually work with the rows that
contain one of the words.  This query will repeat the same logic for
however many keywords are entered by the user.  I have investigated
text searching options and have not found them to be congruous with my
application.

Why is it choosing a sequential scan one part of the query when
searching for the words, yet using an index scan for another part of
it?  Is there a better way to structure the query to give it better
hints?

I'm using 8.0.1 on a 4-way Opteron with beefy RAID-10 and 12GB of RAM.

Thank you for any advice.

-Dan

pgsql-performance by date:

From: Mischa Sandberg
Date: 08 May 2005, 19:33:17
Subject: Re: Whence the Opterons?

From: Josh Berkus
Date: 08 May 2005, 21:48:27
Subject: Re: Query tuning help

Query tuning help - Mailing list pgsql-performance

Previous

Next