Re: Re[2]: planner/optimizer hash index method and so on - Mailing list pgsql-sql
From | Tom Lane |
---|---|
Subject | Re: Re[2]: planner/optimizer hash index method and so on |
Date | |
Msg-id | 18041.954950651@sss.pgh.pa.us Whole thread Raw |
In response to | Re[2]: planner/optimizer hash index method and so on ("Alexey V. Meledin" <avm@webclub.ru>) |
Responses |
Re[2]: Re[2]: planner/optimizer hash index method and so on
|
List | pgsql-sql |
"Alexey V. Meledin" <avm@webclub.ru> writes: >>> create index itd_hash_data_id on ifs_tree_default USING HASH (ifs_data_id); >>> create index iatd_hash_data_id on ifs_account_tree_data USING HASH (ifs_data_id); TL> Why would you do that? The hash index method doesn't have any advantage TL> over btree that I can see, and it's got a lot of disadvantages. > But as I understand from documentation, Hash Join is the preferable > method of JOIN, because of Seq Scan performed only once (I suppose, that > it's not full table scan!?!). Where in the documentation did you see that? If it implies that then it's wrong. We could certainly simplify the optimizer a lot if it could just always pick a hash join ;-). But more to the point, whether a hash join is used has nothing to do with what kind of index you have. You do not need a hash index to support a hash join. Hash join means that we build a in-memory hash table containing the relevant rows from the inner relation (discarding any that can be rejected due to single-relation WHERE conditions), and then probe into that table for each row of the outer relation. > Nested Joins performs too many scans for results in JOIN, containing > many rows and are so slow on large tables (planner/optimizer problem?). Could be. The optimizer is a work-in-progress; I wouldn't claim that it always makes the right choices. But without more details it's hard to say if it's doing the wrong thing or not. > 1. What is the difference among "Seq Scan" and "Index Scan" in > a planer/optimizer query execution plan? sequential scan (read whole table sequentially) or index scan (use index to scan just the rows meeting a WHERE condition that matches the index). > So, the difference is in ROWS and WIDTH: > PS: Each table has 100000 rows. > 1. Nested Loop (cost=4.05 rows=1 width=28) -> Index Scan using aaa_index_b on aaa a (cost=2.05 rows=1 width=12) -> Index Scan using bbb_pk on bbb b (cost=2.00 rows=61984 width=16) > 2. Nested Loop (cost=4.05 rows=1 width=12) -> Index Scan using aaa_index_b on aaa a (cost=2.05 rows=1 width=4) -> Index Scan using bbb_pk on bbb b (cost=2.00 rows=1 width=8) > So, questions: > 1. can I say, that the second query is more optimal then 1-st? I'm not sure why 6.5 fails to display a reasonable rows estimate in the first case. Current sources produce a more reasonable result. That rows estimate is clearly wrong, or at least inconsistent with the estimated cost of the indexscan and the estimated number of rows produced by the join. > 2. Variants I try: > 2.1. When I use no indexes, then: > 2.1.1. Nested Loop (cost=2442.50 rows=1 width=28) -> Seq Scan on aaa a (cost=2440.50 rows=1 width=12) -> Index Scan using bbb_pk on bbb b (cost=2.00 rows=61984 width=16) > 2.1.2. Nested Loop (cost=2442.50 rows=1 width=12) -> Seq Scan on aaa a (cost=2440.50 rows=1 width=4) -> Index Scan using bbb_pk on bbb b (cost=2.00 rows=1 width=8) > 2.2. I try to set indexes on a.a and b.a, supposing that it helps a > bit on relation "a.a=b.a" > 2.2.1. Nested Loop (cost=3355.28 rows=1 width=28) -> Seq Scan on aaa a (cost=3353.28 rows=1 width=12) -> Index Scan using bbb_pk on bbb b (cost=2.00 rows=85159 width=16) > 2.2.2. Nested Loop (cost=3355.28 rows=1 width=12) -> Seq Scan on aaa a (cost=3353.28 rows=1 width=4) -> Index Scan using bbb_pk on bbb b (cost=2.00 rows=1 width=8) > Perfomace become lower!! Why? Did performance *actually* change? I don't see how it could, considering that the plan is the same. I hope you're not confusing the planner's cost estimates with reality ;-). If you're wondering why the estimate changed, it's probably because CREATE INDEX updates the planner's information about number of rows and number of disk pages the relation has. I guess you loaded more data since your last VACUUM. > 2.4. I add indexes on a.a and a.b > Nothing happens! It's already using an index that covers column b, and the index on column a doesn't help because it can only use one index in a scan. regards, tom lane