Re: tweaking NTUP_PER_BUCKET - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: tweaking NTUP_PER_BUCKET |
Date | |
Msg-id | 53CADE1D.3010601@fuzzy.cz Whole thread Raw |
In response to | Re: tweaking NTUP_PER_BUCKET (Tomas Vondra <tv@fuzzy.cz>) |
Responses |
Re: tweaking NTUP_PER_BUCKET
|
List | pgsql-hackers |
On 19.7.2014 20:28, Tomas Vondra wrote: > On 19.7.2014 20:24, Tom Lane wrote: >> Tomas Vondra <tv@fuzzy.cz> writes: >>> I've reviewed the two test cases mentioned here, and sadly there's >>> nothing that can be 'fixed' by this patch. The problem here lies in the >>> planning stage, which decides to hash the large table - we can't fix >>> that in the executor. >> >> We've heard a couple reports before of the planner deciding to hash a >> larger table rather than a smaller one. The only reason I can think of >> for that is if the smaller table has many more duplicates, so that the >> planner thinks the executor might end up traversing long hash chains. >> The planner's estimates could easily be off in this area, of course. >> estimate_hash_bucketsize() is the likely culprit if it's wrong. >> >> Which test case are you seeing this in, exactly? > > The two reported by Stephen here: > > > http://www.postgresql.org/message-id/20130328235627.GV4361@tamriel.snowman.net > > Just download this (I had to gunzip it): > > http://snowman.net/~sfrost/test_case.sql > http://snowman.net/~sfrost/test_case2.sql Anyway, looking a bit at the distributions, I don't think the distributions are significantly skewed. In the first testscase, I get this test=# select cnt, count(*) from (select id_short, count(*) AS cnt from small_table group by 1) foo group by cnt order by cnt;cnt | count -----+------- 1 | 23 2 | 18795 3 | 13 4 | 726 5 | 4 6 | 93 8 | 4 10 | 1 (8 rows) and in the second one I get this: test=# select cnt, count(*) from (select id_short, count(*) AS cnt from small_table group by 1) foo group by cnt order by cnt;cnt | count -----+-------- 1 | 182161 2 | 5582 3 | 371 4 | 28 5 | 2 6 | 3 9 | 1 (7 rows) So while #1 contains most values twice, #2 is almost perfectly distinct. In the big_table, the values are unique. For the first case, a WARNING at the end of estimate_hash_bucketsize says this: WARNING: nbuckets=8388608.00 estfract=0.000001 WARNING: nbuckets=65536.00 estfract=0.000267 There are 4.3M rows in the big_table, so it says ~4 tuples (selectivity >= 1e-6), and ~10 tuples/bucket for the small_table (42k rows). For the second case, I get this: WARNING: nbuckets=16777216.00 estfract=0.000001 WARNING: nbuckets=262144.00 estfract=0.000100 i.e. ~11 tuples/bucket for big_table and ~20 tuples for small_table. That sounds really strange, because small_table in the second test case is almost perfectly unique. And in the first test case, there are only very few keys with >2 occurences. By explicitly setting the selectivity in estimate_hash_bucketsize to 1e-6, I got these plans: Test case #1 QUERY PLAN ---------------------------------------------------------------------Hash Join (cost=1229.46..79288.95 rows=41176 width=24)(actual time=50.397..634.986 rows=13731 loops=1) Hash Cond: (big_table.id_short = small_table.id_short) -> Seq Scan on big_table (cost=0.00..61626.71 rows=4272271 width=4) (actual time=0.018..229.597 rows=4272271 loops=1) -> Hash (cost=714.76..714.76 rows=41176 width=24) (actual time=31.653..31.653 rows=41176 loops=1) Buckets: 65536 Batches: 1 Memory Usage: 3086kB -> Seq Scan on small_table (cost=0.00..714.76 rows=41176 width=24) (actual time=0.012..13.341 rows=41176 loops=1)Planning time: 0.597 msExecution time: 635.498 ms (8 rows) Without explain analyze: 486 ms (original plan: >850ms). Test case #2 QUERY PLAN ---------------------------------------------------------------------Hash Join (cost=5240.21..220838.44 rows=194587 width=4)(actual time=88.252..2143.457 rows=149540 loops=1) Hash Cond: (big_table.id_short = small_table.id_short) -> Seq Scan on big_table (cost=0.00..169569.72 rows=11755372 width=4) (actual time=0.018..533.955 rows=11755475 loops=1) -> Hash (cost=2807.87..2807.87 rows=194587 width=4) (actual time=87.548..87.548 rows=194587 loops=1) Buckets: 262144 Batches: 1 Memory Usage: 8889kB -> Seq Scan onsmall_table (cost=0.00..2807.87 rows=194587 width=4) (actual time=0.012..29.929 rows=194587 loops=1)Planning time: 0.438 msExecution time: 2146.818 ms (8 rows) Without explain analyze: 1600 ms (original plan: >2300ms) The differences are fairly small - well, it's 30-40% faster, but it's less than 1s in both cases. I'm wondering whether we could get into similar problems with much longer queries and still get 30-40% improvement. Tomas
pgsql-hackers by date: