Hash Join vs Nested Loops in 7.2.1 ... - Mailing list pgsql-general
From | Ed Loehr |
---|---|
Subject | Hash Join vs Nested Loops in 7.2.1 ... |
Date | |
Msg-id | 3CB26973.5010808@bluepolka.net Whole thread Raw |
Responses |
Re: Hash Join vs Nested Loops in 7.2.1 ...
|
List | pgsql-general |
I have a 7.2.1 query with two peculiar characteristics and wondered if anyone could offer some insight. First, my query takes 90 seconds with a hash join, but only 1 second with nested loops. Second, the same query sometimes takes 10-50 seconds shortly after possibly a dump or other high-data-volume queries are executed, after which it then returns to 1 second execution time. Getting crowded out of shared memory? Finally, I am inclined to turn off hash joins altogether. What should I expect to lose in doing so? Schema, query, and plans shown below... Thanks, Ed -- 700,000 rows CREATE TABLE story ( value TEXT, key SERIAL ); CREATE UNIQUE INDEX story_pkey ON story USING btree (key); -- 700,000 rows CREATE TABLE dict ( word VARCHAR, key SERIAL ); CREATE UNIQUE INDEX dict_pkey ON dict USING btree (key); CREATE UNIQUE INDEX dict_word_key ON dict USING btree (word); -- 28,000,000 rows CREATE TABLE story_dict ( dictkey INTEGER NOT NULL, storykey INTEGER NOT NULL ); CREATE UNIQUE INDEX story_dict_pkey ON story_dict(dictkey, storykey); CREATE INDEX story_dict_tk_idx ON story_dict(storykey); -- Query: -- ====== SELECT DISTINCT ftd1.storykey FROM story_dict ftd1, dict d1 WHERE d1.word = 'foo' AND d1.key = ftd1.dictkey AND EXISTS ( SELECT ft2.key FROM story ft2, story_dict ftd2, dict d2 WHERE d2.word = 'bar' AND d2.key = ftd2.dictkey AND ftd2.storykey = ft2.key AND ftd2.storykey = ftd1.storykey) AND EXISTS ( SELECT ft3.key FROM story ft3, story_dict ftd3, dict d3 WHERE d3.word = 'baz' AND d3.key = ftd3.dictkey AND ftd3.storykey = ft3.key AND ftd3.storykey = ftd1.storykey) ORDER BY ftd1.storykey LIMIT 1000; Plan with PGOPTIONS = "-fn": Limit (cost=15409053054.71..15409053054.73 rows=1 width=12) -> Unique (cost=15409053054.71..15409053054.73 rows=1 width=12) -> Sort (cost=15409053054.71..15409053054.71 rows=9 width=12) -> Nested Loop (cost=100000000.00..15409053054.57 rows=9 width=12) -> Index Scan using word_idx on dict d1 (cost=0.00..5.98 rows=1 width=4) -> Index Scan using story_dict_pkey on story_dict ftd1 (cost=0.00..15309052993.63 rows=4398 width=8) SubPlan -> Merge Join (cost=429157.62..435130.62 rows=1 width=16) -> Sort (cost=13.11..13.11 rows=1 width=12) -> Hash Join (cost=5.98..13.10 rows=1 width=12) -> Index Scan using story_dict_tk_idx on story_dict ftd2 (cost=0.00..6.59 rows=106 width=8) -> Hash (cost=5.98..5.98 rows=1 width=4) -> Index Scan using word_idx on dict d2 (cost=0.00..5.98 rows=1 width=4) -> Sort (cost=429144.50..429144.50 rows=2389196 width=4) -> Seq Scan on story ft2 (cost=0.00..86701.96 rows=2389196 width=4) -> Merge Join (cost=429157.62..435130.62 rows=1 width=16) -> Sort (cost=13.11..13.11 rows=1 width=12) -> Hash Join (cost=5.98..13.10 rows=1 width=12) -> Index Scan using story_dict_tk_idx on story_dict ftd3 (cost=0.00..6.59 rows=106 width=8) -> Hash (cost=5.98..5.98 rows=1 width=4) -> Index Scan using word_idx on dict d3 (cost=0.00..5.98 rows=1 width=4) -> Sort (cost=429144.50..429144.50 rows=2389196 width=4) -> Seq Scan on story ft3 (cost=0.00..86701.96 rows=2389196 width=4) Plan with PGOPTIONS = "-fh": Limit (cost=635283.31..635283.33 rows=1 width=12) -> Unique (cost=635283.31..635283.33 rows=1 width=12) -> Sort (cost=635283.31..635283.31 rows=9 width=12) -> Nested Loop (cost=0.00..635283.17 rows=9 width=12) -> Index Scan using word_idx on dict d1 (cost=0.00..5.98 rows=1 width=4) -> Index Scan using story_dict_pkey on story_dict ftd1 (cost=0.00..635222.22 rows=4398 width=8) SubPlan -> Nested Loop (cost=0.00..16.07 rows=1 width=16) -> Nested Loop (cost=0.00..12.00 rows=1 width=12) -> Index Scan using word_idx on dict d2 (cost=0.00..5.98 rows=1 width=4) -> Index Scan using story_dict_pkey on story_dict ftd2 (cost=0.00..6.01 rows=1 width=8) -> Index Scan using story_pkey on story ft2 (cost=0.00..4.06 rows=1 width=4) -> Nested Loop (cost=0.00..16.07 rows=1 width=16) -> Nested Loop (cost=0.00..12.00 rows=1 width=12) -> Index Scan using word_idx on dict d3 (cost=0.00..5.98 rows=1 width=4) -> Index Scan using story_dict_pkey on story_dict ftd3 (cost=0.00..6.01 rows=1 width=8) -> Index Scan using story_pkey on story ft3 (cost=0.00..4.06 rows=1 width=4)
pgsql-general by date: