Home > mailing lists

Hash join in 8.3 - Mailing list pgsql-general

From	André Volpato
Subject	Hash join in 8.3
Date	December 13, 2007 13:20:32
Msg-id	47617218.6040405@ecomtecnologia.com.br Whole thread Raw
Responses	Re: Hash join in 8.3 Re: Hash join in 8.3
List	pgsql-general

Tree view

I´m running some compatibility and performance tests, between two
servers with 8.1 and 8.3 as follows :

[1] 8.1: postgres 8.1.9 / 1GB Ram / P4 3Ghz / Debian 2.6.18-4-686
[2] 8.3: postgres 8.3 b4 / 1GB Ram / P4 3Ghz / Debian 2.6.18-4-686

[2] is faster for every single operation, but I found something with the
planner that seems odd.

Consider this structure:

create table test (
    i bigint unique not null,
    t text
);
populated with 4 million rows with generate_series(1,4000000)

create table jtest (
    i bigint not null,
    constraint jtestfk foreign key (i) references test (i)
);
populated with 6 million rows


And the query:

# select j.i, t.t from jtest j inner join test t on t.i = j.i where
(j.i*1.5) between 3000000 and 4000000;

Planner for [1]:
 Nested Loop  (cost=0.00..270192.02 rows=20000 width=41) (actual
time=4192.514..32781.498 rows=1333334 loops=1)
   ->  Seq Scan on jtest j  (cost=0.00..179412.02 rows=30000 width=8)
(actual time=4147.813..19195.877 rows=1333334 loops=1)
         Filter: ((((i)::numeric * 1.5) >= 3000000::numeric) AND
(((i)::numeric * 1.5) <= 4000000::numeric))
   ->  Index Scan using test_i_key on test t  (cost=0.00..3.01 rows=1
width=41) (actual time=0.007..0.008 rows=1 loops=1333334)
         Index Cond: (t.i = "outer".i)
 Total runtime: 33372.300 ms

Planner for [2]:
 Hash Join  (cost=176924.02..297518.03 rows=20000 width=38) (actual
time=125715.079..239893.461 rows=1333334 loops=1)
   Hash Cond: (t.i = j.i)
   ->  Seq Scan on test t  (cost=0.00..75394.00 rows=4000000 width=38)
(actual time=0.051..4344.157 rows=4000000 loops=1)
   ->  Hash  (cost=176549.02..176549.02 rows=30000 width=8) (actual
time=11711.708..11711.708 rows=1333334 loops=1)
         ->  Seq Scan on jtest j  (cost=0.00..176549.02 rows=30000
width=8) (actual time=2228.052..10812.444 rows=1333334 loops=1)
               Filter: ((((i)::numeric * 1.5) >= 3000000::numeric) AND
(((i)::numeric * 1.5) <= 4000000::numeric))
 Total runtime: 240461.273 ms

Besides the (expected) weak guess on rows for both servers on seq scan
on jtest, there is something nasty with [2] that prevents the planner to
use the index.
For some reason, [1] uses the index first, and then seq scan to filter.
[2] seq scans filter first, and hash aggregate instead of using the index.

Now, turning off hashing:
# set enable_hashjoin=off;
# set enable_hashagg=off;

Again for [2]:
 Merge Join  (cost=178781.75..328370.60 rows=20000 width=38) (actual
time=15703.086..18799.493 rows=1333334 loops=1)
   Merge Cond: (t.i = j.i)
   ->  Index Scan using test_i_key on test t  (cost=0.00..139273.96
rows=4000018 width=38) (actual time=0.125..2895.835 rows=2666667 loops=1)
   ->  Sort  (cost=178781.75..178856.75 rows=30000 width=8) (actual
time=12423.001..13007.569 rows=1333334 loops=1)
         Sort Key: j.i
         Sort Method:  quicksort  Memory: 84852kB
         ->  Seq Scan on jtest j  (cost=0.00..176550.85 rows=30000
width=8) (actual time=2076.495..10417.157 rows=1333334 loops=1)
               Filter: ((((i)::numeric * 1.5) >= 3000000::numeric) AND
(((i)::numeric * 1.5) <= 4000000::numeric))
 Total runtime: 19340.734 ms

Works fine now. Quicksort and index scan.

Some points here:
1. The query is kinda stupid, but its a compatibility test (I´m trying
to figure out how many queries must be rewritten for 8.3)
2. Vacuum is up2date!
3. Is there any way to make [2] use the index ?

--

[]´s,

André Volpato
Ecom Tecnologia LTDA - Análise e Desenvolvimento
andre.volpato@ecomtecnologia.com.br

pgsql-general by date:

From: "Scott Marlowe"
Date: 13 December 2007, 13:03:27
Subject: Re: Need to find out which process is hitting hda

From: Reg Me Please
Date: 13 December 2007, 13:44:10
Subject: COPY speedup

Hash join in 8.3 - Mailing list pgsql-general

Previous

Next