Using indexes to speed up join? - Mailing list pgsql-novice
From | James David Smith |
---|---|
Subject | Using indexes to speed up join? |
Date | |
Msg-id | CAMu32ADQcufCFfSsoeQ7rUVwscUEZD3dkO0-guL5CJznsQEqHA@mail.gmail.com Whole thread Raw |
List | pgsql-novice |
Hi there, The below query works exactly how I want it too, but it's too slow. Running it on a few hundred rows (LIMIT 100) takes a second or two, but when I run it against 4.2 rows it takes a long time. I stopped it after about an hour. I'd appreciate some advice on the most effective way to use indexes to improve it please? The query, as well as the result of 'EXPLAIN ANALYZE' are shown below. I can see it's doing some sequential scans which I know are bad... but? Cheers James QUERY: ------------------------------------------------------- SELECT unique_hybrid_hour_locations.*, CASE WHEN pm25_io_ratio IS NULL THEN monthly_london_average_pm25_io_ratio ELSE pm25_io_ratio END AS pm25_io_ratio FROM unique_hybrid_hour_locations -- This mini sub query joins the IO PM2.5 ratio from the ratios table. The join is done by month and by location. LEFT JOIN io_postcode_ratios ON EXTRACT(MONTH FROM time) = io_postcode_ratios.month_num AND ST_Within( st_transform(st_setsrid(st_makepoint(lon, lat),4326),27700), io_postcode_ratios.the_geom ) = true -- This mini sub query calculates the monthly averages for London -- and joins them depending on the month. They're used if can't get the proper one. LEFT JOIN ( SELECT month_num, avg(pm25_io_ratio) as monthly_london_average_pm25_io_ratio FROM io_postcode_ratios GROUP BY month_num ORDER BY month_num) a ON a.month_num = EXTRACT(MONTH FROM time) ------------------------------------------------------- EXPLAIN ANALYZE: "Hash Left Join (cost=902914.76..70538171.09 rows=4760537 width=98)" " Hash Cond: (date_part('month'::text, unique_hybrid_hour_locations."time") = (a.month_num)::double precision)" " -> Merge Left Join (cost=900483.64..70503130.29 rows=4760537 width=66)" " Merge Cond: ((date_part('month'::text, unique_hybrid_hour_locations."time")) = ((io_postcode_ratios.month_num)::double precision))" " Join Filter: ((st_transform(st_setsrid(st_makepoint(unique_hybrid_hour_locations.lon, unique_hybrid_hour_locations.lat), 4326), 27700) && io_postcode_ratios.the_geom) AND _st_contains(io_postcode_ratios.the_geom, st_transform(st_setsrid(st_makepoin (...)" " -> Sort (cost=848686.52..860587.86 rows=4760537 width=58)" " Sort Key: (date_part('month'::text, unique_hybrid_hour_locations."time"))" " -> Seq Scan on unique_hybrid_hour_locations (cost=0.00..141692.37 rows=4760537 width=58)" " -> Materialize (cost=51797.11..51847.93 rows=10164 width=11199)" " -> Sort (cost=51797.11..51822.52 rows=10164 width=11199)" " Sort Key: ((io_postcode_ratios.month_num)::double precision)" " -> Seq Scan on io_postcode_ratios (cost=0.00..2379.64 rows=10164 width=11199)" " -> Hash (cost=2430.98..2430.98 rows=12 width=36)" " -> Subquery Scan on a (cost=2430.83..2430.98 rows=12 width=36)" " -> Sort (cost=2430.83..2430.86 rows=12 width=12)" " Sort Key: io_postcode_ratios_1.month_num" " -> HashAggregate (cost=2430.46..2430.61 rows=12 width=12)" " -> Seq Scan on io_postcode_ratios io_postcode_ratios_1 (cost=0.00..2379.64 rows=10164 width=12)"
pgsql-novice by date: