Re: General performance questions about postgres on Apple - Mailing list pgsql-performance
From | Sean Shanny |
---|---|
Subject | Re: General performance questions about postgres on Apple |
Date | |
Msg-id | 403682D6.60807@earthlink.net Whole thread Raw |
Responses |
Re: General performance questions about postgres on Apple
|
List | pgsql-performance |
scott.marlowe wrote: >On Fri, 20 Feb 2004, Sean Shanny wrote: > > > >>max_connections = 100 >> >># - Memory - >> >>shared_buffers = 16000 # min 16, at least max_connections*2, >>8KB each >>sort_mem = 256000 # min 64, size in KB >> >> > >You might wanna drop sort_mem somewhat and just set it during your imports >to something big like 512000 or larger. That way with 100 users during >the day you won't have to worry about swap storms, and when you run your >updates, you get all that sort_mem. > > > >>Actual row count in the temp table: >> >>select count(*) from referral_temp ; >> 502347 >> >>Actual row count in d_referral table: >> >>select count(*) from d_referral ; >> 27908024 >> >> >>Note: that an analyze had not been performed on the referral_temp table >>prior to the explain analyze run. >> >>explain analyze SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT >>OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5 >> >>Nested Loop Left Join (cost=0.00..3046.00 rows=1001 width=68) (actual >>time=136.513..6440616.541 rows=502347 loops=1) >> -> Seq Scan on referral_temp t2 (cost=0.00..20.00 rows=1000 >>width=64) (actual time=21.730..10552.421 rows=502347 loops=1) >> -> Index Scan using d_referral_referral_md5_key on d_referral t1 >>(cost=0.00..3.01 rows=1 width=40) (actual time=12.768..14.022 rows=1 >>loops=502347) >> Index Cond: ("outer".md5 = t1.referral_md5) >> >> >>Thanks. >> >>--sean >> Total runtime: 6441969.698 ms >>(5 rows) >> >> >>Here is an explain analyze after the analyze was done. Unfortunately I >>think a lot of the data was still in cache when I did this again :-( >> >>explain analyze SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT >>OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5; >> >>Nested Loop Left Join (cost=0.00..1468759.69 rows=480082 width=149) >>(actual time=69.576..3226854.850 rows=502347 loops=1) >> -> Seq Scan on referral_temp t2 (cost=0.00..16034.81 rows=480081 >>width=145) (actual time=11.206..4003.521 rows=502347 loops=1) >> -> Index Scan using d_referral_referral_md5_key on d_referral t1 >>(cost=0.00..3.01 rows=1 width=40) (actual time=6.396..6.402 rows=1 >>loops=502347) >> Index Cond: ("outer".md5 = t1.referral_md5) >> Total runtime: 3227830.752 ms >> >> > >Hmmm. It looks like postgresql is still picking a nested loop when it >should be sorting something faster. Try doing a "set enable_nestloop = >off" and see what you get. > > New results with the above changes: (Rather a huge improvement!!!) Thanks Scott. I will next attempt to make the cpu_* changes to see if it the picks the correct plan. explain analyze SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Hash Left Join (cost=1669281.60..3204008.48 rows=480082 width=149) (actual time=157221.125..-412311.378 rows=502347 loops=1) Hash Cond: ("outer".md5 = "inner".referral_md5) -> Seq Scan on referral_temp t2 (cost=0.00..16034.81 rows=480081 width=145) (actual time=11.537..1852.336 rows=502347 loops=1) -> Hash (cost=1356358.48..1356358.48 rows=30344048 width=40) (actual time=157187.530..157187.530 rows=0 loops=1) -> Seq Scan on d_referral t1 (cost=0.00..1356358.48 rows=30344048 width=40) (actual time=14.134..115048.285 rows=27908024 loops=1) Total runtime: 212595.909 ms (6 rows) Time: 213094.984 ms tripmaster=# explain analyze SELECT t1.id, t2.md5, t2.url from url_temp t2 LEFT OUTER JOIN d_url t1 ON t2.md5 = t1.url_md5; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Hash Left Join (cost=2023843.40..3157938.15 rows=1379872 width=191) (actual time=178150.113..867074.579 rows=1172920 loops=1) Hash Cond: ("outer".md5 = "inner".url_md5) -> Seq Scan on url_temp t2 (cost=0.00..50461.72 rows=1379872 width=187) (actual time=6.597..6692.324 rows=1172920 loops=1) -> Hash (cost=1734904.72..1734904.72 rows=28018272 width=40) (actual time=178124.568..178124.568 rows=0 loops=1) -> Seq Scan on d_url t1 (cost=0.00..1734904.72 rows=28018272 width=40) (actual time=16.912..2639059.078 rows=23239137 loops=1) Total runtime: 242846.965 ms (6 rows) Time: 243190.900 ms >If that makes it faster, you may want to adjust the costs of the cpu_* >stuff higher to see if that can force it to do the right thing. > >Looking at the amount of time taken by the nested loop, it looks like the >problem to me. > >And why are you doing a left join of ONE row from one table against the >whole temp table? Do you really need to do that? since there's only one >row in the source table, and I'd guess is only matches one or a few rows >from the temp table, this means you're gonna have that one row and a bunch >of null filled rows to go with it. > > > >
pgsql-performance by date: