Re: -HEAD planner issue wrt hash_joins on dbt3 ? - Mailing list pgsql-hackers
| From | Stefan Kaltenbrunner | 
|---|---|
| Subject | Re: -HEAD planner issue wrt hash_joins on dbt3 ? | 
| Date | |
| Msg-id | 4507B4C2.9060208@kaltenbrunner.cc Whole thread Raw | 
| In response to | Re: -HEAD planner issue wrt hash_joins on dbt3 ? (Tom Lane <tgl@sss.pgh.pa.us>) | 
| Responses | Re: -HEAD planner issue wrt hash_joins on dbt3 ? | 
| List | pgsql-hackers | 
Tom Lane wrote: > Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: >> Tom Lane wrote: >>> Could we see the actual EXPLAIN ANALYZE results for the slow plan? > >> http://www.kaltenbrunner.cc/files/dbt3_explain_analyze.txt > > Well, indeed it seems that the hash join is just an innocent bystander: > the bulk of the runtime (all but about 120 sec in fact) is spent here: > > -> Nested Loop (cost=13.65..1719683.85 rows=12000672 width=49) (actual time=60.325..24923860.713rows=11897899 loops=1) > -> Merge Join (cost=0.00..10248.66 rows=20000 width=41) (actual time=16.654..2578.060 rows=19837loops=1) > ... > -> Bitmap Heap Scan on lineitem (cost=13.65..77.16 rows=665 width=16) (actual time=13.492..1254.535rows=600 loops=19837) > Recheck Cond: (lineitem.l_suppkey = supplier.s_suppkey) > -> Bitmap Index Scan on i_l_suppkey (cost=0.00..13.65 rows=665 width=0) (actual time=10.662..10.662rows=600 loops=19837) > Index Cond: (lineitem.l_suppkey = supplier.s_suppkey) > > I suppose that the profile result you showed was taken during the > startup transient where it was computing the hashtables that this loop's > results are joined to ... but that's not where the problem is. The > problem is repeating that bitmap scan on lineitem for nearly 20000 > different l_suppkeys. possible - I actually took them over a longer period of time > > Apparently we've made the planner a bit too optimistic about the savings > that can be expected from repeated indexscans occurring on the inside of > a join. The other plan uses a different join order and doesn't try to > join lineitem until it's got orders.o_orderkey, whereupon it does a > mergejoin against an indexscan on lineitem: > > -> Index Scan using i_l_orderkey on lineitem (cost=0.00..2715943.34 rows=60003360 width=16) (actualtime=32.868..123668.380 rows=59991868 loops=1) > > The runtimes for the remainders of the plans are roughly comparable, so > it's the cost of joining lineitem that is hurting here. > > Is lineitem sorted (or nearly sorted) by l_orderkey? Part of the > problem could be overestimating the cost of this indexscan. > > What are the physical sizes of lineitem and its indexes, and how do > those compare to your RAM? What are you using for planner settings > (particularly effective_cache_size)? ouch - you are right(as usual) here. effective_cache_size was set to 10GB(my fault for copying over the conf from a 16GB box) during the run - lowering it just a few megabytes(!) or to a more realistic 6GB results in the following MUCH better plan: http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt as for the relation sizes: dbt3=# select pg_relation_size('lineitem'); pg_relation_size ------------------ 10832764928 (1 row) dbt3=# select pg_total_relation_size('lineitem'); pg_total_relation_size ------------------------ 22960259072 (1 row) there are nine btree indexes on lineitem all between 1,1GB and 1,4GB in size. Stefan
pgsql-hackers by date: