Home > mailing lists

Re: Re[2]: planner/optimizer hash index method and so on - Mailing list pgsql-sql

From	Tom Lane
Subject	Re: Re[2]: planner/optimizer hash index method and so on
Date	April 5, 2000 12:05:27
Msg-id	18041.954950651@sss.pgh.pa.us Whole thread Raw
In response to	Re[2]: planner/optimizer hash index method and so on ("Alexey V. Meledin" <avm@webclub.ru>)
Responses	Re[2]: Re[2]: planner/optimizer hash index method and so on
List	pgsql-sql

Tree view

"Alexey V. Meledin" <avm@webclub.ru> writes:
>>> create index itd_hash_data_id on ifs_tree_default USING HASH (ifs_data_id);
>>> create index iatd_hash_data_id on ifs_account_tree_data USING HASH (ifs_data_id);

TL> Why would you do that?  The hash index method doesn't have any advantage
TL> over btree that I can see, and it's got a lot of disadvantages.

> But as I understand from documentation, Hash Join is the preferable
> method of JOIN, because of Seq Scan performed only once (I suppose, that
> it's not full table scan!?!).

Where in the documentation did you see that?  If it implies that then
it's wrong.  We could certainly simplify the optimizer a lot if it could
just always pick a hash join ;-).

But more to the point, whether a hash join is used has nothing to do
with what kind of index you have.  You do not need a hash index to
support a hash join.  Hash join means that we build a in-memory hash
table containing the relevant rows from the inner relation (discarding
any that can be rejected due to single-relation WHERE conditions), and
then probe into that table for each row of the outer relation.

> Nested Joins performs too many scans for results in JOIN, containing
> many rows and are so slow on large tables (planner/optimizer problem?).

Could be.  The optimizer is a work-in-progress; I wouldn't claim that it
always makes the right choices.  But without more details it's hard to
say if it's doing the wrong thing or not.

> 1. What is the difference among "Seq Scan" and "Index Scan" in
> a planer/optimizer query execution plan?

sequential scan (read whole table sequentially) or index scan (use
index to scan just the rows meeting a WHERE condition that matches
the index).

> So, the difference is in ROWS and WIDTH:
> PS: Each table has 100000 rows.
> 1. Nested Loop  (cost=4.05 rows=1 width=28)
-> Index Scan using aaa_index_b on aaa a  (cost=2.05 rows=1 width=12)
-> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=61984 width=16)
> 2. Nested Loop  (cost=4.05 rows=1 width=12)
-> Index Scan using aaa_index_b on aaa a  (cost=2.05 rows=1 width=4)
-> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=1 width=8)

> So, questions:
> 1. can I say, that the second query is more optimal then 1-st?

I'm not sure why 6.5 fails to display a reasonable rows estimate in
the first case.  Current sources produce a more reasonable result.
That rows estimate is clearly wrong, or at least inconsistent with
the estimated cost of the indexscan and the estimated number of
rows produced by the join.

> 2. Variants I try:
> 2.1. When I use no indexes, then:
> 2.1.1. Nested Loop  (cost=2442.50 rows=1 width=28)
-> Seq Scan on aaa a  (cost=2440.50 rows=1 width=12)
-> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=61984 width=16)
> 2.1.2. Nested Loop  (cost=2442.50 rows=1 width=12)
-> Seq Scan on aaa a  (cost=2440.50 rows=1 width=4)
-> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=1 width=8)

> 2.2. I try to set indexes on a.a and b.a, supposing that it helps a
> bit on relation "a.a=b.a"
> 2.2.1. Nested Loop  (cost=3355.28 rows=1 width=28)
-> Seq Scan on aaa a  (cost=3353.28 rows=1 width=12)
-> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=85159 width=16)
> 2.2.2.  Nested Loop  (cost=3355.28 rows=1 width=12)
-> Seq Scan on aaa a  (cost=3353.28 rows=1 width=4)
-> Index Scan using bbb_pk on bbb b  (cost=2.00 rows=1 width=8)

> Perfomace become lower!! Why?

Did performance *actually* change?  I don't see how it could,
considering that the plan is the same.  I hope you're not confusing
the planner's cost estimates with reality ;-).

If you're wondering why the estimate changed, it's probably because
CREATE INDEX updates the planner's information about number of rows
and number of disk pages the relation has.  I guess you loaded
more data since your last VACUUM.

> 2.4. I add indexes on a.a and a.b
> Nothing happens!

It's already using an index that covers column b, and the index on
column a doesn't help because it can only use one index in a scan.
        regards, tom lane

pgsql-sql by date:

From: Oleg Broytmann
Date: 05 April 2000, 10:36:15
Subject: Re: Is Pgsql an OORDBMS?

From: "Alexey V. Meledin"
Date: 05 April 2000, 12:41:22
Subject: Re[2]: Re[2]: planner/optimizer hash index method and so on

Re: Re[2]: planner/optimizer hash index method and so on - Mailing list pgsql-sql

Previous

Next