Query Join Performance - Mailing list pgsql-sql
From | Aaron Bono |
---|---|
Subject | Query Join Performance |
Date | |
Msg-id | bf05e51c0704241826u3e6652d1m26973b837e0b1397@mail.gmail.com Whole thread Raw |
Responses |
Re: Query Join Performance
|
List | pgsql-sql |
I have a database with the following tables:<br /><br />zip {<br /> zip_id bigserial,<br /> zip_cd varchar(6) -- thisis indexed<br />}<br /><br />city {<br /> city_id bigserial,<br /> city_name varchar(50)<br />}<br /><br />zip_city{ <br /> city_id bigint (FK to city table),<br /> zip_id bigint (FK to zip table),<br /> longitude numeric,<br/> latitude numeric<br />}<br /><br />We want to find the zip_cd values for all zip codes within 50 miles ofa specified zip code and have this query so far (assume our given zip code is 64131): <br /><br /> select<br /> zip_city.zip_id,<br /> gps_distance(zip_city.longitude::numeric, zip_city.latitude::numeric, my_location.longitude::numeric,my_location.latitude::numeric) AS distance<br /> from zip_city <br /> inner join (<br/> select<br /> longitude,<br /> latitude<br /> from zip<br /> inner joinzip_city on (<br /> zip.zip_id = zip_city.zip_id<br /> )<br /> where zip_cd = '64131' <br /> ) my_location on (<br /> -- We use this as an approximation to filter out as many records as possible beforedoing the gps_distance() function call<br /> -- It works well as a outer bounds to get the most obvious outsidezip_city values. <br /> abs(zip_city.longitude - my_location.longitude) <= (50.0 / 60.0)<br /> ANDabs(zip_city.latitude - my_location.latitude) <= (50.0 / 60.0)<br /> )<br /><br />This runs in about 2 seconds onWindows and gives me 650 records - not bad. <br /><br />When I add just a little bit:<br /><br /> select<br /> zip_city.zip_id,<br /> gps_distance(zip_city.longitude::numeric, zip_city.latitude::numeric, my_location.longitude::numeric,my_location.latitude::numeric) AS distance <br /> from zip_city<br /> -->>>>THE ONLY DIFFERENCE IS THE NEXT 3 LINES <<<<--<br /> inner join zip on (<br /> zip_city.zip_id= zip.zip_id<br /> )<br /> -->>>> End of added code <<<<-- <br /> inner join(<br /> select<br /> longitude,<br /> latitude<br /> from zip<br /> innerjoin zip_city on (<br /> zip.zip_id = zip_city.zip_id<br /> )<br /> where zip_cd = '64131'<br /> ) my_location on (<br /> abs(zip_city.longitude - my_location.longitude) <= (50.0 / 60.0)<br /> AND abs(zip_city.latitude - my_location.latitude) <= (50.0 / 60.0)<br /> )<br /><br />Performance tanks withthis query - it takes over 120 seconds (that is where I set the timeout). I read that a foreign key doesn't help performanceand my guess is that PostgreSQL isn't just joining on the 650 records but rather on the over 800,000 records inthe zip table (yes we have that many - we include Canada zip codes which is 90% of the data - Blame Canada! ;). <br /><br/>Is there something I can do to boost performance? I tried putting the first query above in an inner join before joiningwith the zip table with the same result.<br /><br />BTW, on our Linux box the full query we run (which adds 3 moretables on the whole operation along with more filtering on the zip table) finishes in under 10 seconds. Problem is ourdevelopment is on Windows and this is a real pain for developers to test. <br /><br />Any assistance is appreciated. I did look at the explain plans but I don't see anything other than what I mentioned above. I can include thoseif necessary.<br /><br />Thanks!<br />Aaron<br clear="all" /><br />-- <br /> ==================================================================<br/> Aaron Bono<br /> Aranya Software Technologies,Inc.<br /> <a href="http://www.aranya.com">http://www.aranya.com</a><br /> <a href="http://codeelixir.com">http://codeelixir.com</a><br />==================================================================