Home > mailing lists

Re: hash join vs nested loop join - Mailing list pgsql-performance

From	Kevin Grittner
Subject	Re: hash join vs nested loop join
Date	December 12, 2012 23:47:38
Msg-id	20121212234732.80090@gmx.com Whole thread Raw
In response to	hash join vs nested loop join (Huan Ruan <leohuanruan@gmail.com>)
Responses	Re: hash join vs nested loop join
List	pgsql-performance

Tree view

Huan Ruan wrote:

> is a lot slower than a nested loop join.

Giving actual numbers is more useful than terms like "a lot". Even
better is to provide the output of EXPLAIN ANALYZZE rather than
just EXPLAIN. This shows estimates against actual numbers, and give
timings. For more suggestions see this page:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

> I don't understand why the optimiser chooses the hash join in
> favor of the nested loop. What can I do to get the optimiser to
> make a better decision (nested loop in this case)? I have run
> analyze on both tables.

> Config changes are
>
>  - shared_buffers = 6GB
>  - effective_cache_size = 18GB
>  - work_mem = 10MB
>  - maintenance_work_mem = 3GB

As already suggested, there was a change made in 9.2 which may have
over-penalized nested loops using index scans. This may be fixed in
the next minor release.

Also, as already suggested, you may want to reduce random_page
cost, to bring it in line with the actual cost relative to
seq_page_cost based on your cache hit ratio.

Additionally, I just routinely set cpu_tuple_cost higher than the
default of 0.01. I find that 0.03 to 0.05 better models the actual
relative cost of processing a tuple.

-Kevin

pgsql-performance by date:

From: Craig Ringer
Date: 12 December 2012, 23:26:50
Subject: Re: Do I have a hardware or a software problem?

From: Amitabh Kant
Date: 13 December 2012, 04:08:35
Subject: Limit & offset effect on query plans

Re: hash join vs nested loop join - Mailing list pgsql-performance

Previous

Next