Query tuning help - Mailing list pgsql-performance
From | Dan Harris |
---|---|
Subject | Query tuning help |
Date | |
Msg-id | 924ac6c029f5453c442eacdda06b132c@drivefaster.net Whole thread Raw |
Responses |
Re: Query tuning help
Re: Query tuning help Re: Query tuning help |
List | pgsql-performance |
Sorry to bother everyone with yet another "my query isn't using an index" problem but I am over my head on this one.. I am open to ways of restructuring this query to perform better. I have a table, 'ea', with 22 million rows in it. VACUUM ANALYZE has been just run on the table. This is the result of: explain analyze select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat from ea, em, eg where em.incidentid = ea.incidentid and em.incidentid = eg.incidentid and em.entrydate >= '2005-1-1 00:00' and em.entrydate <= '2005-5-9 00:00' and ea.incidentid in ( select incidentid from ea where recordtext like '%RED%' ) and ea.incidentid in ( select incidentid from ea where recordtext like '%CORVETTE%' ) and ( recordtext like '%RED%' or recordtext like '%CORVETTE%' ) order by em.entrydate --------------------- ANALYZE RESULTS --------------------- Unique (cost=774693.72..774693.76 rows=1 width=159) (actual time=446787.056..446787.342 rows=72 loops=1) -> Sort (cost=774693.72..774693.72 rows=1 width=159) (actual time=446787.053..446787.075 rows=72 loops=1) Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat -> Nested Loop (cost=771835.10..774693.71 rows=1 width=159) (actual time=444378.655..446786.746 rows=72 loops=1) -> Nested Loop (cost=771835.10..774688.81 rows=1 width=148) (actual time=444378.532..446768.381 rows=72 loops=1) -> Nested Loop IN Join (cost=771835.10..774678.88 rows=2 width=81) (actual time=444367.080..446191.864 rows=701 loops=1) -> Nested Loop (cost=771835.10..774572.05 rows=42 width=64) (actual time=444366.859..445463.232 rows=1011 loops=1) -> HashAggregate (cost=771835.10..771835.10 rows=1 width=17) (actual time=444366.702..444368.583 rows=473 loops=1) -> Seq Scan on ea (cost=0.00..771834.26 rows=335 width=17) (actual time=259.746..444358.837 rows=592 loops=1) Filter: ((recordtext)::text ~~ '%CORVETTE%'::text) -> Index Scan using ea1 on ea (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 loops=473) Index Cond: ((ea.incidentid)::text = ("outer".incidentid)::text) Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text)) -> Index Scan using ea1 on ea (cost=0.00..2733.81 rows=42 width=17) (actual time=0.703..0.703 rows=1 loops=1011) Index Cond: (("outer".incidentid)::text = (ea.incidentid)::text) Filter: ((recordtext)::text ~~ '%RED%'::text) -> Index Scan using em_incidentid_idx on em (cost=0.00..4.95 rows=1 width=67) (actual time=0.820..0.821 rows=0 loops=701) Index Cond: (("outer".incidentid)::text = (em.incidentid)::text) Filter: ((entrydate >= '2005-01-01 00:00:00'::timestamp without time zone) AND (entrydate <= '2005-05-09 00:00:00'::timestamp without time zone)) -> Index Scan using eg_incidentid_idx on eg (cost=0.00..4.89 rows=1 width=79) (actual time=0.245..0.246 rows=1 loops=72) Index Cond: (("outer".incidentid)::text = (eg.incidentid)::text) Total runtime: 446871.880 ms (22 rows) ------------------------- EXPLANATION ------------------------- The reason for the redundant LIKE clause is that first, I only want those "incidentid"s that contain the words 'RED' and 'CORVETTE'. BUT, those two words may exist across multiple records with the same incidentid. Then, I only want to actually work with the rows that contain one of the words. This query will repeat the same logic for however many keywords are entered by the user. I have investigated text searching options and have not found them to be congruous with my application. Why is it choosing a sequential scan one part of the query when searching for the words, yet using an index scan for another part of it? Is there a better way to structure the query to give it better hints? I'm using 8.0.1 on a 4-way Opteron with beefy RAID-10 and 12GB of RAM. Thank you for any advice. -Dan
pgsql-performance by date: