Home > mailing lists

Re: General performance questions about postgres on Apple - Mailing list pgsql-performance

From	Sean Shanny
Subject	Re: General performance questions about postgres on Apple
Date	February 20, 2004 18:29:49
Msg-id	403682D6.60807@earthlink.net Whole thread Raw
Responses	Re: General performance questions about postgres on Apple
List	pgsql-performance

Tree view


scott.marlowe wrote:

>On Fri, 20 Feb 2004, Sean Shanny wrote:
>
>
>
>>max_connections = 100
>>
>># - Memory -
>>
>>shared_buffers = 16000          # min 16, at least max_connections*2,
>>8KB each
>>sort_mem = 256000               # min 64, size in KB
>>
>>
>
>You might wanna drop sort_mem somewhat and just set it during your imports
>to something big like 512000 or larger.  That way with 100 users during
>the day you won't have to worry about swap storms, and when you run your
>updates, you get all that sort_mem.
>
>
>
>>Actual row count in the temp table:
>>
>>select count(*) from referral_temp ;
>>  502347
>>
>>Actual row count in d_referral table:
>>
>>select count(*) from d_referral ;
>>  27908024
>>
>>
>>Note: that an analyze had not been performed on the referral_temp table
>>prior to the explain analyze run.
>>
>>explain analyze  SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT
>>OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5
>>
>>Nested Loop Left Join  (cost=0.00..3046.00 rows=1001 width=68) (actual
>>time=136.513..6440616.541 rows=502347 loops=1)
>>   ->  Seq Scan on referral_temp t2  (cost=0.00..20.00 rows=1000
>>width=64) (actual time=21.730..10552.421 rows=502347 loops=1)
>>   ->  Index Scan using d_referral_referral_md5_key on d_referral t1
>>(cost=0.00..3.01 rows=1 width=40) (actual time=12.768..14.022 rows=1
>>loops=502347)
>>         Index Cond: ("outer".md5 = t1.referral_md5)
>>
>>
>>Thanks.
>>
>>--sean
>> Total runtime: 6441969.698 ms
>>(5 rows)
>>
>>
>>Here is an explain analyze after the analyze was done.  Unfortunately I
>>think a lot of the data was still in cache when I did this again :-(
>>
>>explain analyze  SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT
>>OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5;
>>
>>Nested Loop Left Join  (cost=0.00..1468759.69 rows=480082 width=149)
>>(actual time=69.576..3226854.850 rows=502347 loops=1)
>>   ->  Seq Scan on referral_temp t2  (cost=0.00..16034.81 rows=480081
>>width=145) (actual time=11.206..4003.521 rows=502347 loops=1)
>>   ->  Index Scan using d_referral_referral_md5_key on d_referral t1
>>(cost=0.00..3.01 rows=1 width=40) (actual time=6.396..6.402 rows=1
>>loops=502347)
>>         Index Cond: ("outer".md5 = t1.referral_md5)
>> Total runtime: 3227830.752 ms
>>
>>
>
>Hmmm.  It looks like postgresql is still picking a nested loop when it
>should be sorting something faster.  Try doing a "set enable_nestloop =
>off" and see what you get.
>
>
New results with the above changes: (Rather a huge improvement!!!)
Thanks Scott.  I will next attempt to make the cpu_* changes to see if
it the picks the correct plan.

explain analyze  SELECT t1.id, t2.md5, t2.url from referral_temp t2 LEFT
OUTER JOIN d_referral t1 ON t2.md5 = t1.referral_md5;
                                                                  QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=1669281.60..3204008.48 rows=480082 width=149)
(actual time=157221.125..-412311.378 rows=502347 loops=1)
   Hash Cond: ("outer".md5 = "inner".referral_md5)
   ->  Seq Scan on referral_temp t2  (cost=0.00..16034.81 rows=480081
width=145) (actual time=11.537..1852.336 rows=502347 loops=1)
   ->  Hash  (cost=1356358.48..1356358.48 rows=30344048 width=40)
(actual time=157187.530..157187.530 rows=0 loops=1)
         ->  Seq Scan on d_referral t1  (cost=0.00..1356358.48
rows=30344048 width=40) (actual time=14.134..115048.285 rows=27908024
loops=1)
 Total runtime: 212595.909 ms
(6 rows)

Time: 213094.984 ms
tripmaster=# explain analyze  SELECT t1.id, t2.md5, t2.url from url_temp
t2 LEFT OUTER JOIN d_url t1 ON t2.md5 = t1.url_md5;
                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=2023843.40..3157938.15 rows=1379872 width=191)
(actual time=178150.113..867074.579 rows=1172920 loops=1)
   Hash Cond: ("outer".md5 = "inner".url_md5)
   ->  Seq Scan on url_temp t2  (cost=0.00..50461.72 rows=1379872
width=187) (actual time=6.597..6692.324 rows=1172920 loops=1)
   ->  Hash  (cost=1734904.72..1734904.72 rows=28018272 width=40)
(actual time=178124.568..178124.568 rows=0 loops=1)
         ->  Seq Scan on d_url t1  (cost=0.00..1734904.72 rows=28018272
width=40) (actual time=16.912..2639059.078 rows=23239137 loops=1)
 Total runtime: 242846.965 ms
(6 rows)

Time: 243190.900 ms

>If that makes it faster, you may want to adjust the costs of the cpu_*
>stuff higher to see if that can force it to do the right thing.
>
>Looking at the amount of time taken by the nested loop, it looks like the
>problem to me.
>
>And why are you doing a left join of ONE row from one table against the
>whole temp table?  Do you really need to do that?  since there's only one
>row in the source table, and I'd guess is only matches one or a few rows
>from the temp table, this means you're gonna have that one row and a bunch
>of null filled rows to go with it.
>
>
>
>

pgsql-performance by date:

From: Bruno Wolff III
Date: 20 February 2004, 16:22:34
Subject: Re: [ADMIN] Index called with Union but not with OR clause

From: "scott.marlowe"
Date: 20 February 2004, 19:50:00
Subject: Re: General performance questions about postgres on Apple

Re: General performance questions about postgres on Apple - Mailing list pgsql-performance

Previous

Next