Re: Query Join Performance - Mailing list pgsql-sql
From | Aaron Bono |
---|---|
Subject | Re: Query Join Performance |
Date | |
Msg-id | bf05e51c0704251339u2883211eh4cdedcbec63c2be9@mail.gmail.com Whole thread Raw |
In response to | Re: Query Join Performance (Richard Huxton <dev@archonet.com>) |
Responses |
Re: Query Join Performance
|
List | pgsql-sql |
On 4/25/07, Richard Huxton <dev@archonet.com> wrote:
Now I am having the same problem on the Linux box so I doubt it is the platform.
The biggest problem I notice is when I add a join from a child table (zip_city) to a parent table (zip). I have filtered the child table down to about 650 records but when I add the join to the parent which has over 800,000 records, performance tanks. I was able to benchmark two queries last night on my Windows machine:
-- This runs in just over 2 seconds
select
nearby_zip_city.zip_id,
gps_distance(zip_city.longitude::numeric, zip_city.latitude::numeric, nearby_zip_city.longitude::numeric, nearby_zip_city.latitude::numeric) AS distance
from zip_city
inner join zip on (
zip.zip_id = zip_city.zip_id
)
inner join zip_city as nearby_zip_city on (
abs(zip_city.longitude - nearby_zip_city.longitude) <= (50.0 / 60.0)
AND abs(zip_city.latitude - nearby_zip_city.latitude) <= (50.0 / 60.0)
)
where zip.zip_cd = '66105'
-- This takes over 48 seconds and I just added a join from the zip_city child to the zip parent table
select
nearby_zip.zip_cd,
gps_distance(zip_city.longitude::numeric, zip_city.latitude::numeric, nearby_zip_city.longitude::numeric, nearby_zip_city.latitude::numeric) AS distance
from zip_city
inner join zip on (
zip.zip_id = zip_city.zip_id
)
inner join zip_city as nearby_zip_city on (
abs(zip_city.longitude - nearby_zip_city.longitude) <= (50.0 / 60.0)
AND abs(zip_city.latitude - nearby_zip_city.latitude) <= ( 50.0 / 60.0)
)
-->>> The next 3 lines are the main difference <<<--
inner join zip as nearby_zip on (
nearby_zip_city.zip_id = nearby_zip.zip_id
)
-->>> End of difference <<<--
where zip.zip_cd = '66105'
-- Explain plan for faster/first query:
Nested Loop (cost=45779.82..147990502.45 rows=451678770 width=40) (actual time=5404.943..20151.684 rows=653 loops=1)
Join Filter: ((abs(("inner".longitude - "outer".longitude)) <= 0.833333333333333::double precision) AND (abs(("inner".latitude - "outer".latitude)) <= 0.833333333333333::double precision))
-> Seq Scan on zip_city nearby_zip_city (cost=0.00..25514.19 rows=901719 width=24) (actual time= 0.053..2311.547 rows=901719 loops=1)
-> Materialize (cost=45779.82..45824.90 rows=4508 width=16) (actual time=0.003..0.009 rows=1 loops=901719)
-> Hash Join (cost=8944.55..45775.31 rows=4508 width=16) (actual time= 444.657..4490.901 rows=1 loops=1)
Hash Cond: ("outer".zip_id = "inner".zip_id)
-> Seq Scan on zip_city (cost=0.00..25514.19 rows=901719 width=24) (actual time=0.004..2334.548 rows=901719 loops=1)
-> Hash (cost=8933.90..8933.90 rows=4261 width=8) (actual time=0.296..0.296 rows=1 loops=1)
-> Bitmap Heap Scan on zip (cost=32.91..8933.90 rows=4261 width=8) (actual time= 0.272..0.275 rows=1 loops=1)
Recheck Cond: ((zip_cd)::text = '66105'::text)
-> Bitmap Index Scan on zip_zip_cd_key (cost=0.00..32.91 rows=4261 width=0) (actual time= 0.250..0.250 rows=1 loops=1)
Index Cond: ((zip_cd)::text = '66105'::text)
-- Explain plan for shower/second query:
Nested Loop (cost=75372.31..148056286.32 rows=451678770 width=43) (actual time= 62688.188..69916.943 rows=653 loops=1)
Join Filter: ((abs(("inner".longitude - "outer".longitude)) <= 0.833333333333333::double precision) AND (abs(("inner".latitude - "outer".latitude)) <= 0.833333333333333::double precision))
-> Hash Join (cost=29592.49..91298.06 rows=901719 width=27) (actual time=17905.224..52279.151 rows=901719 loops=1)
Hash Cond: ("outer".zip_id = "inner".zip_id)
-> Seq Scan on zip_city nearby_zip_city (cost=0.00..25514.19 rows=901719 width=24) (actual time=0.044..2888.993 rows=901719 loops=1)
-> Hash (cost=21634.79..21634.79 rows=852279 width=19) (actual time= 13925.502..13925.502 rows=852279 loops=1)
-> Seq Scan on zip nearby_zip (cost=0.00..21634.79 rows=852279 width=19) (actual time=0.042..2535.742 rows=852279 loops=1)
-> Materialize (cost=45779.82..45824.90 rows=4508 width=16) (actual time=0.002..0.009 rows=1 loops=901719)
-> Hash Join (cost=8944.55..45775.31 rows=4508 width=16) (actual time=421.374..4453.224 rows=1 loops=1)
Hash Cond: ("outer".zip_id = "inner".zip_id)
-> Seq Scan on zip_city (cost=0.00..25514.19 rows=901719 width=24) (actual time=0.028..2333.941 rows=901719 loops=1)
-> Hash (cost=8933.90..8933.90 rows=4261 width=8) (actual time= 0.604..0.604 rows=1 loops=1)
-> Bitmap Heap Scan on zip (cost=32.91..8933.90 rows=4261 width=8) (actual time=0.588..0.591 rows=1 loops=1)
Recheck Cond: ((zip_cd)::text = '66105'::text)
-> Bitmap Index Scan on zip_zip_cd_key (cost=0.00..32.91 rows=4261 width=0) (actual time=0.559..0.559 rows=1 loops=1)
Index Cond: ((zip_cd)::text = '66105'::text)
It is the join that is killing the query but I am at a loss of the best approach to fix it. I have some work arounds in mind by flattening out the tables but I would rather not have to do that.
Thanks!
Aaron
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
Aaron Bono wrote:
> Performance tanks with this query - it takes over 120 seconds (that is
> where
> I set the timeout).
> BTW, on our Linux box the full query we run (which adds 3 more tables on
> the
> whole operation along with more filtering on the zip table) finishes in
> under 10 seconds. Problem is our development is on Windows and this is a
> real pain for developers to test.
So what's different between the systems. Obvious things to look at:
1. EXPLAIN ANALYSE outputs to show the plans (these presumably are
different, but in what details, and why?)
2. Configuration (particularly memory/cost settings).
3. Hardware.
4. Locale/encoding - these can affect index usage and sorting.
--
Richard Huxton
Archonet Ltd
Now I am having the same problem on the Linux box so I doubt it is the platform.
The biggest problem I notice is when I add a join from a child table (zip_city) to a parent table (zip). I have filtered the child table down to about 650 records but when I add the join to the parent which has over 800,000 records, performance tanks. I was able to benchmark two queries last night on my Windows machine:
-- This runs in just over 2 seconds
select
nearby_zip_city.zip_id,
gps_distance(zip_city.longitude::numeric, zip_city.latitude::numeric, nearby_zip_city.longitude::numeric, nearby_zip_city.latitude::numeric) AS distance
from zip_city
inner join zip on (
zip.zip_id = zip_city.zip_id
)
inner join zip_city as nearby_zip_city on (
abs(zip_city.longitude - nearby_zip_city.longitude) <= (50.0 / 60.0)
AND abs(zip_city.latitude - nearby_zip_city.latitude) <= (50.0 / 60.0)
)
where zip.zip_cd = '66105'
-- This takes over 48 seconds and I just added a join from the zip_city child to the zip parent table
select
nearby_zip.zip_cd,
gps_distance(zip_city.longitude::numeric, zip_city.latitude::numeric, nearby_zip_city.longitude::numeric, nearby_zip_city.latitude::numeric) AS distance
from zip_city
inner join zip on (
zip.zip_id = zip_city.zip_id
)
inner join zip_city as nearby_zip_city on (
abs(zip_city.longitude - nearby_zip_city.longitude) <= (50.0 / 60.0)
AND abs(zip_city.latitude - nearby_zip_city.latitude) <= ( 50.0 / 60.0)
)
-->>> The next 3 lines are the main difference <<<--
inner join zip as nearby_zip on (
nearby_zip_city.zip_id = nearby_zip.zip_id
)
-->>> End of difference <<<--
where zip.zip_cd = '66105'
-- Explain plan for faster/first query:
Nested Loop (cost=45779.82..147990502.45 rows=451678770 width=40) (actual time=5404.943..20151.684 rows=653 loops=1)
Join Filter: ((abs(("inner".longitude - "outer".longitude)) <= 0.833333333333333::double precision) AND (abs(("inner".latitude - "outer".latitude)) <= 0.833333333333333::double precision))
-> Seq Scan on zip_city nearby_zip_city (cost=0.00..25514.19 rows=901719 width=24) (actual time= 0.053..2311.547 rows=901719 loops=1)
-> Materialize (cost=45779.82..45824.90 rows=4508 width=16) (actual time=0.003..0.009 rows=1 loops=901719)
-> Hash Join (cost=8944.55..45775.31 rows=4508 width=16) (actual time= 444.657..4490.901 rows=1 loops=1)
Hash Cond: ("outer".zip_id = "inner".zip_id)
-> Seq Scan on zip_city (cost=0.00..25514.19 rows=901719 width=24) (actual time=0.004..2334.548 rows=901719 loops=1)
-> Hash (cost=8933.90..8933.90 rows=4261 width=8) (actual time=0.296..0.296 rows=1 loops=1)
-> Bitmap Heap Scan on zip (cost=32.91..8933.90 rows=4261 width=8) (actual time= 0.272..0.275 rows=1 loops=1)
Recheck Cond: ((zip_cd)::text = '66105'::text)
-> Bitmap Index Scan on zip_zip_cd_key (cost=0.00..32.91 rows=4261 width=0) (actual time= 0.250..0.250 rows=1 loops=1)
Index Cond: ((zip_cd)::text = '66105'::text)
-- Explain plan for shower/second query:
Nested Loop (cost=75372.31..148056286.32 rows=451678770 width=43) (actual time= 62688.188..69916.943 rows=653 loops=1)
Join Filter: ((abs(("inner".longitude - "outer".longitude)) <= 0.833333333333333::double precision) AND (abs(("inner".latitude - "outer".latitude)) <= 0.833333333333333::double precision))
-> Hash Join (cost=29592.49..91298.06 rows=901719 width=27) (actual time=17905.224..52279.151 rows=901719 loops=1)
Hash Cond: ("outer".zip_id = "inner".zip_id)
-> Seq Scan on zip_city nearby_zip_city (cost=0.00..25514.19 rows=901719 width=24) (actual time=0.044..2888.993 rows=901719 loops=1)
-> Hash (cost=21634.79..21634.79 rows=852279 width=19) (actual time= 13925.502..13925.502 rows=852279 loops=1)
-> Seq Scan on zip nearby_zip (cost=0.00..21634.79 rows=852279 width=19) (actual time=0.042..2535.742 rows=852279 loops=1)
-> Materialize (cost=45779.82..45824.90 rows=4508 width=16) (actual time=0.002..0.009 rows=1 loops=901719)
-> Hash Join (cost=8944.55..45775.31 rows=4508 width=16) (actual time=421.374..4453.224 rows=1 loops=1)
Hash Cond: ("outer".zip_id = "inner".zip_id)
-> Seq Scan on zip_city (cost=0.00..25514.19 rows=901719 width=24) (actual time=0.028..2333.941 rows=901719 loops=1)
-> Hash (cost=8933.90..8933.90 rows=4261 width=8) (actual time= 0.604..0.604 rows=1 loops=1)
-> Bitmap Heap Scan on zip (cost=32.91..8933.90 rows=4261 width=8) (actual time=0.588..0.591 rows=1 loops=1)
Recheck Cond: ((zip_cd)::text = '66105'::text)
-> Bitmap Index Scan on zip_zip_cd_key (cost=0.00..32.91 rows=4261 width=0) (actual time=0.559..0.559 rows=1 loops=1)
Index Cond: ((zip_cd)::text = '66105'::text)
It is the join that is killing the query but I am at a loss of the best approach to fix it. I have some work arounds in mind by flattening out the tables but I would rather not have to do that.
Thanks!
Aaron
--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================