Thread: 9.2.4 vs 9.3.0 query planning (sort merge join vs hash join)
Hi all,
Is there a propensity for 9.2.4 to prefer a sort-merge-join, in place of a hash join?
I’m fairly sure the answer is yes, but I also want to be sure I’m interpreting the explain output correctly.
I’m comparing behaviour between two systems, which for all intents and purposes are identical save for the version of postgres.
There appears to be nothing wrong with the row estimates given in the explain plan on either machine, however actual performance is significantly impaired on the 9.2.4 setup due to the preference for the use of a sort-merge join, compared to a hash-join on 9.3.0
Snippet from 9.2.4
-> Merge Left Join (cost=19598754.29..19602284.00 rows=469996 width=434) (actual time=6369152.750..6386029.191 rows=6866896 loops=1)
Buffers: shared hit=489837 read=1724585
-> Sort (cost=19598650.62..19599825.61 rows=469996 width=120) (actual time=6369151.307..6373591.881 rows=6866896 loops=1) (A)
Sort Method: quicksort Memory: 1162266kB
Buffers: shared hit=489765 read=1724585
-> Hash Left Join (cost=429808.90..19554371.62 rows=469996 width=120) (actual time=37306.534..6353455.046 rows=6866896 loops=1) (B)
Rows Removed by Filter: 20862464
Buffers: shared hit=489765 read=1724585
Snippet from 9.3.0
-> Hash Left Join (cost=617050.43..20948535.43 rows=566893 width=434) (actual time=51816.864..934723.548 rows=6866896 loops=1)
Buffers: shared hit=1732 read=2010920 written=1
-> Hash Left Join (cost=616993.23..20870529.73 rows=566893 width=120) (actual time=51796.882..923196.579 rows=6866896 loops=1)
Rows Removed by Filter: 20862464
Buffers: shared hit=1732 read=2010877 written=1
As you can see, the estimates are similar enough between them, but 9.2.4 want’s to run sort-merge plan (A) – and the resulting execution time blows out hugely.
Intersetingly, it actually looks like it is the hash join immediately preceding the quick sort that isn’t performing well (B). Though I suspect this is just how an explain plan reads - is this ultimately because the sort node is unable to retrieve tuples from the child node quickly enough?
Setting enable_mergejoin = 0 appears to solve this, but I think an upgrade to 9.3.4 is going to win over.
Cheers,
Tim
Tim Kane <tim.kane@gmail.com> writes: > Is there a propensity for 9.2.4 to prefer a sort-merge-join, in place of a > hash join? Not particularly; I don't think there's any actual difference in the cost estimation equations between 9.2 and 9.3. The two plans you show are close enough in estimated cost that the ordering of their costs might be coming out differently just as a matter of random variation in statistics. It'd be worth double-checking the work_mem setting on both systems, though, as (IIRC) an undersized work_mem hurts the estimate for hashes more than for sorts. regards, tom lane
Hmm. Interesting.
Thanks Tom, it does indeed look like the planner is evaluating and excluding the hashed-join plan as having a higher cost. I can see this by setting enable_mergejoin=0.
I think this may play against other aspects of the query (though only marginally), so I can’t really compare the resulting cost metrics – but they’re certainly close.
I’ve just now played a little more with work_mem. I had already tried increasing work_mem to all kinds of obscene levels.
Oddly, it seems the solution here is in fact to *reduce* work_mem in order to elicit the preferred hash-join based plan.
In fact, I needed to reduce it to as low as 64MB for this behaviour – which seems counter-intuitive. These are not small queries, so I previously had it pushed up to 6GB for these tasks.
For smaller datasets,I need to reduce work_mem further still in order to obtain a hash-join plan – though the difference in execution time becomes less of a problem at this size.
Tim
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Monday, 12 May 2014 12:23
To: Tim Kane <tim.kane@gmail.com>
Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] 9.2.4 vs 9.3.0 query planning (sort merge join vs hash join)
Date: Monday, 12 May 2014 12:23
To: Tim Kane <tim.kane@gmail.com>
Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] 9.2.4 vs 9.3.0 query planning (sort merge join vs hash join)
Tim Kane <tim.kane@gmail.com> writes:
Is there a propensity for 9.2.4 to prefer a sort-merge-join, in place of ahash join?
Not particularly; I don't think there's any actual difference in the cost
estimation equations between 9.2 and 9.3. The two plans you show are
close enough in estimated cost that the ordering of their costs might be
coming out differently just as a matter of random variation in statistics.
It'd be worth double-checking the work_mem setting on both systems,
though, as (IIRC) an undersized work_mem hurts the estimate for hashes
more than for sorts.
regards, tom lane
On Mon, May 12, 2014 at 3:45 AM, Tim Kane <tim.kane@gmail.com> wrote:
Hi all,Is there a propensity for 9.2.4 to prefer a sort-merge-join, in place of a hash join?I’m fairly sure the answer is yes, but I also want to be sure I’m interpreting the explain output correctly.I’m comparing behaviour between two systems, which for all intents and purposes are identical save for the version of postgres.There appears to be nothing wrong with the row estimates given in the explain plan on either machine, however actual performance is significantly impaired on the 9.2.4 setup due to the preference for the use of a sort-merge join, compared to a hash-join on 9.3.0Snippet from 9.2.4-> Merge Left Join (cost=19598754.29..19602284.00 rows=469996 width=434) (actual time=6369152.750..6386029.191 rows=6866896 loops=1)Buffers: shared hit=489837 read=1724585-> Sort (cost=19598650.62..19599825.61 rows=469996 width=120) (actual time=6369151.307..6373591.881 rows=6866896 loops=1) (A)Sort Method: quicksort Memory: 1162266kBBuffers: shared hit=489765 read=1724585-> Hash Left Join (cost=429808.90..19554371.62 rows=469996 width=120) (actual time=37306.534..6353455.046 rows=6866896 loops=1) (B)Rows Removed by Filter: 20862464Buffers: shared hit=489765 read=1724585Snippet from 9.3.0-> Hash Left Join (cost=617050.43..20948535.43 rows=566893 width=434) (actual time=51816.864..934723.548 rows=6866896 loops=1)Buffers: shared hit=1732 read=2010920 written=1-> Hash Left Join (cost=616993.23..20870529.73 rows=566893 width=120) (actual time=51796.882..923196.579 rows=6866896 loops=1)Rows Removed by Filter: 20862464Buffers: shared hit=1732 read=2010877 written=1As you can see, the estimates are similar enough between them, but 9.2.4 want’s to run sort-merge plan (A) – and the resulting execution time blows out hugely.Intersetingly, it actually looks like it is the hash join immediately preceding the quick sort that isn’t performing well (B). Though I suspect this is just how an explain plan reads - is this ultimately because the sort node is unable to retrieve tuples from the child node quickly enough?
It looks to me like a caching issue. The two Hash Left Joins seem to be identical (although occurring on different levels of the plan) but one takes much more time. But it is hard to know without seeing what if feeding into those hash joins.
Cheers,
Jeff
Setting enable_mergejoin = 0 appears to solve this, but I think an upgrade to 9.3.4 is going to win over.Cheers,Tim