Thread: [GENERAL] hash join performance question
Hi testdb3=# \d csischema.dim_company; Table "csischema.dim_company" Column | Type | Modifiers -----------------+-----------------------------+----------- company_id | integer | not null company_name | character varying(100) | city | character varying(100) | state | character varying(100) | postal_code | character varying(100) | country | character varying(100) | latitude | double precision | longitude | double precision | update_datetime | timestamp without time zone | company_source | character varying(1) | Indexes: "dim_company_pkey" PRIMARY KEY, btree (company_id) testdb3=# \d woc.dim_company; Table "woc.dim_company" Column | Type | Modifiers -----------------+-----------------------------+----------- company_id | integer | not null company_name | character varying(100) | city | character varying(100) | state | character varying(100) | postal_code | character varying(100) | country | character varying(100) | latitude | double precision | longitude | double precision | update_datetime | timestamp without time zone | company_source | character varying(1) | Indexes: "dim_company_pkey" PRIMARY KEY, btree (company_id) testdb3=# select count(*) from csischema.dim_company; count --------- 1786376 (1 row) testdb3=# select count(*) from woc.dim_company; count ------- 18980 (1 row) woc.dim_company is a subset of csischema.dim_company meaning all company_id from woc.dim_company are in csischema.dim_company Ratio is around 1% SELECT a.company_id FROM csischema.dim_company a, woc.dim_company b WHERE a.company_id = b.company_id; testdb3=# explain analyze SELECT a.company_id FROM csischema.dim_company a, woc.dim_company b testdb3-# WHERE a.company_id = b.company_id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=711.05..54938.35 rows=18980 width=4) (actual time=34.067..1118.603 rows=18980 loops=1) Hash Cond: (a.company_id = b.company_id) -> Seq Scan on dim_company a (cost=0.00..47097.82 rows=1850582 width=4) (actual time=0.013..523.249 rows=1786376 loops=1) -> Hash (cost=473.80..473.80 rows=18980 width=4) (actual time=20.203..20.203 rows=18980 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 924kB -> Seq Scan on dim_company b (cost=0.00..473.80 rows=18980 width=4) (actual time=0.007..10.076 rows=18980 loops=1) Planning time: 0.511 ms Execution time: 1121.068 ms (8 rows) I was expecting at least the PK of csischema.dim_company to be used . In another DBMS that was the case. The larger table, csischema.dim_company used the PK. Any hints, thoughts what am I not seing ? Thank you Armand
armand pirvu <armand.pirvu@gmail.com> writes: > testdb3=# explain analyze SELECT a.company_id FROM csischema.dim_company a, woc.dim_company b > testdb3-# WHERE a.company_id = b.company_id; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------- > Hash Join (cost=711.05..54938.35 rows=18980 width=4) (actual time=34.067..1118.603 rows=18980 loops=1) > Hash Cond: (a.company_id = b.company_id) > -> Seq Scan on dim_company a (cost=0.00..47097.82 rows=1850582 width=4) (actual time=0.013..523.249 rows=1786376 loops=1) > -> Hash (cost=473.80..473.80 rows=18980 width=4) (actual time=20.203..20.203 rows=18980 loops=1) > Buckets: 32768 Batches: 1 Memory Usage: 924kB > -> Seq Scan on dim_company b (cost=0.00..473.80 rows=18980 width=4) (actual time=0.007..10.076 rows=18980 loops=1) > Planning time: 0.511 ms > Execution time: 1121.068 ms > (8 rows) > I was expecting at least the PK of csischema.dim_company to be used . In another DBMS that was the case. The larger table, csischema.dim_company used the PK. That looks like a perfectly reasonable plan to me. If you think it isn't, perhaps because you're assuming that both tables are fully cached in RAM, then you should reduce random_page_cost to teach the planner that that's the execution scenario you're expecting. Everything always in RAM would correspond to random_page_cost = 1, and some rough calculations suggest that that would reduce the estimated cost of a nestloop-with-inner-indexscan enough to make the planner choose that way. regards, tom lane
> On Jul 18, 2017, at 10:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > armand pirvu <armand.pirvu@gmail.com> writes: >> testdb3=# explain analyze SELECT a.company_id FROM csischema.dim_company a, woc.dim_company b >> testdb3-# WHERE a.company_id = b.company_id; >> QUERY PLAN >> ------------------------------------------------------------------------------------------------------------------------------- >> Hash Join (cost=711.05..54938.35 rows=18980 width=4) (actual time=34.067..1118.603 rows=18980 loops=1) >> Hash Cond: (a.company_id = b.company_id) >> -> Seq Scan on dim_company a (cost=0.00..47097.82 rows=1850582 width=4) (actual time=0.013..523.249 rows=1786376 loops=1) >> -> Hash (cost=473.80..473.80 rows=18980 width=4) (actual time=20.203..20.203 rows=18980 loops=1) >> Buckets: 32768 Batches: 1 Memory Usage: 924kB >> -> Seq Scan on dim_company b (cost=0.00..473.80 rows=18980 width=4) (actual time=0.007..10.076 rows=18980 loops=1) >> Planning time: 0.511 ms >> Execution time: 1121.068 ms >> (8 rows) > >> I was expecting at least the PK of csischema.dim_company to be used . In another DBMS that was the case. The larger table, csischema.dim_company used the PK. > > That looks like a perfectly reasonable plan to me. If you think it isn't, > perhaps because you're assuming that both tables are fully cached in RAM, > then you should reduce random_page_cost to teach the planner that that's > the execution scenario you're expecting. Everything always in RAM would > correspond to random_page_cost = 1, and some rough calculations suggest > that that would reduce the estimated cost of a > nestloop-with-inner-indexscan enough to make the planner choose that way. > > regards, tom lane Thank you Tom Made a bit reading about the random_page_cost value I understand not all optimizers are equal But for example in Ingres world K Join(col1) Heap Pages 57 Tups 18981 D696 C1139 / \ Proj-rest $tk1 Sorted(col1) I(a) Pages 76 Tups 18981 B-Tree(col1) D25 C190 Pages 2140 Tups 426435 / $tk2 I(b) B-Tree(NU) Pages 98 Tups 18981 ds8(armandp):/u1/sys_admin/armandp> time sql -uirs testdb <foo.sql > /dev/null real 0m0.37s user 0m0.04s sys 0m0.01s And that is pretty constant, whether pages are in the cache or not More important IMHO , rather than scan the smaller table , I just scan it’s PK which is an index at the end of the day, whichthen I join with the larger table PK Now granted I have hash joins disabled on Ingres so not sure this is a true apple to apple . And that what made me raise the question I would like to know why in Postgres smaller table gets scanned as opposed to use it’s PK After all , one column is far less expensive to traverse top to bottom than all columns Thank you Armand