Encouraging multi-table join order - Mailing list pgsql-performance
From | Dan Harris |
---|---|
Subject | Encouraging multi-table join order |
Date | |
Msg-id | 443AAC26.4070802@drivefaster.net Whole thread Raw |
Responses |
Re: Encouraging multi-table join order
|
List | pgsql-performance |
I have a query that is intended to select from multiple "small tables" to get a limited subset of "incidentid" and then join with a "very large" table. One of the operations will require a sequential scan, but the planner is doing the scan on the very large table before joining the small ones, resulting in a huge amount of disk I/O. How would I make this query join the large table only after narrowing down the possible selections from the smaller tables? This is running on version 8.0.3. Thanks for any ideas. -Dan QUERY ######################################## explain analyze select distinct eventmain.incidentid, eventmain.entrydate, eventgeo.eventlocation, recordtext as retdata from eventactivity join ( select incidentid from k_h where id = 33396 and k_h.entrydate >= '2006-1-1 00:00' and k_h.entrydate < '2006-4-8 00:00' ) id_keywords using ( incidentid ) , eventmain, eventgeo where eventmain.incidentid = eventactivity.incidentid and eventmain.incidentid = eventgeo.incidentid and ( ' ' || recordtext || ' ' like '%HAL%' ) and eventactivity.entrydate >= '2006-1-1 00:00' and eventactivity.entrydate < '2006-4-8 00:00' order by eventmain.entrydate limit 10000; EXPLAIN ANALYZE OUTPUT ######################################## Limit (cost=2521191.65..2521191.90 rows=6 width=187) (actual time=1360935.787..1361072.277 rows=1400 loops=1) -> Unique (cost=2521191.65..2521191.90 rows=6 width=187) (actual time=1360935.779..1361067.853 rows=1400 loops=1) -> Sort (cost=2521191.65..2521191.66 rows=6 width=187) (actual time=1360935.765..1360958.258 rows=16211 loops=1) Sort Key: eventmain.entrydate, eventmain.incidentid, eventactivity.recordtext, eventgeo.eventlocation -> Nested Loop (cost=219.39..2521191.57 rows=6 width=187) (actual time=1123.115..1360579.798 rows=16211 loops=1) -> Nested Loop (cost=219.39..2521173.23 rows=6 width=154) (actual time=1105.773..1325907.716 rows=16211 loops=1) -> Hash Join (cost=219.39..2521153.37 rows=6 width=66) (actual time=1069.476..1289608.261 rows=16211 loops=1) Hash Cond: (("outer".incidentid)::text = ("inner".incidentid)::text) -> Seq Scan on eventactivity (cost=0.00..2518092.06 rows=1532 width=52) (actual time=57.205..1288514.530 rows=2621 loops=1) Filter: ((((' '::text || (recordtext)::text) || ' '::text) ~~ '%HAL%'::text) AND (entrydate >= '2006-01-01 00:00:00'::timestamp without time zone) AND (entrydate < '2006-04-08 00:00:00'::timestamp without time zone)) -> Hash (cost=217.53..217.53 rows=741 width=14) (actual time=899.128..899.128 rows=0 loops=1) -> Index Scan using k_h_id_idx on k_h (cost=0.00..217.53 rows=741 width=14) (actual time=55.097..893.883 rows=1162 loops=1) Index Cond: (id = 33396) Filter: ((entrydate >= '2006-01-01 00:00:00'::timestamp without time zone) AND (entrydate < '2006-04-08 00:00:00'::timestamp without time zone)) -> Index Scan using eventmain_incidentid_idx on eventmain (cost=0.00..3.30 rows=1 width=88) (actual time=1.866..2.227 rows=1 loops=16211) Index Cond: ((eventmain.incidentid)::text = ("outer".incidentid)::text) -> Index Scan using eventgeo_incidentid_idx on eventgeo (cost=0.00..3.04 rows=1 width=75) (actual time=1.770..2.126 rows=1 loops=16211) Index Cond: ((eventgeo.incidentid)::text = ("outer".incidentid)::text) Total runtime: 1361080.787 ms (19 rows)
pgsql-performance by date: