Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F8011347B1@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
Responses |
Re: Bug? ExecChooseHashTableSize() got assertion failed
with crazy number of rows
|
List | pgsql-hackers |
Hello, I noticed ExecChooseHashTableSize() in nodeHash.c got failed by Assert(nbuckets > 0), when crazy number of rows are expected. BACKTRACE: #0 0x0000003f79432625 in raise () from /lib64/libc.so.6 #1 0x0000003f79433e05 in abort () from /lib64/libc.so.6 #2 0x000000000092600a in ExceptionalCondition (conditionName=0xac1ea0 "!(nbuckets > 0)", errorType=0xac1d88 "FailedAssertion", fileName=0xac1d40 "nodeHash.c", lineNumber=545) at assert.c:54 #3 0x00000000006851ff in ExecChooseHashTableSize (ntuples=60521928028, tupwidth=8, useskew=1 '\001', numbuckets=0x7fff146bff04, numbatches=0x7fff146bff00, num_skew_mcvs=0x7fff146bfefc) at nodeHash.c:545 #4 0x0000000000701735 in initial_cost_hashjoin (root=0x253a318, workspace=0x7fff146bffc0, jointype=JOIN_SEMI, hashclauses=0x257e4f0, outer_path=0x2569a40, inner_path=0x2569908, sjinfo=0x2566f40, semifactors=0x7fff146c0168) at costsize.c:2592 #5 0x000000000070e02a in try_hashjoin_path (root=0x253a318, joinrel=0x257d940, outer_path=0x2569a40, inner_path=0x2569908, hashclauses=0x257e4f0, jointype=JOIN_SEMI, extra=0x7fff146c0150) at joinpath.c:543 See the following EXPLAIN output, at the configuration without --enable-cassert. Planner expects 60.5B rows towards the self join by a relation with 72M rows. (Probably, this estimation is too much.) [kaigai@ayu ~]$ (echo EXPLAIN; cat ~/tpcds/query95.sql) | psql tpcds100 QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=9168667273.07..9168667273.08 rows=1 width=20) CTE ws_wh -> Custom Scan (GpuJoin) (cost=3342534.49..654642911.88 rows=60521928028 width=24) Bulkload: On (density: 100.00%) Depth 1: Logic: GpuHashJoin, HashKeys: (ws_order_number), JoinQual: ((ws_warehouse_sk <> ws_warehouse_sk) AND(ws_order_number = ws_order_number)), nrows (ratio: 84056.77%) -> Custom Scan (BulkScan) on web_sales ws1_1 (cost=0.00..3290612.48 rows=72001248 width=16) -> Seq Scan on web_sales ws2 (cost=0.00..3290612.48 rows=72001248 width=16) -> Sort (cost=8514024361.19..8514024361.20 rows=1 width=20) Sort Key: (count(DISTINCT ws1.ws_order_number)) : This crash was triggered by Assert(nbuckets > 0), and nbuckets is calculated as follows. /* * If there's not enough space to store the projected number of tuples and * the required bucket headers, we will need multiple batches. */ if (inner_rel_bytes + bucket_bytes > hash_table_bytes) { /* We'll need multiple batches */ long lbuckets; double dbatch; int minbatch; long bucket_size; /* * Estimate the number of buckets we'll want to have when work_mem is * entirely full. Each bucket will contain a bucket pointer plus * NTUP_PER_BUCKET tuples, whose projected size already includes * overhead for the hash code, pointer to the next tuple, etc. */ bucket_size = (tupsize * NTUP_PER_BUCKET + sizeof(HashJoinTuple)); lbuckets = 1 << my_log2(hash_table_bytes / bucket_size); lbuckets = Min(lbuckets, max_pointers); nbuckets = (int) lbuckets; bucket_bytes = nbuckets * sizeof(HashJoinTuple); : : } Assert(nbuckets > 0); Assert(nbatch > 0); In my case, the hash_table_bytes was 101017630802, and bucket_size was 48. So, my_log2(hash_table_bytes / bucket_size) = 31, then lbuckets will have negative number because both "1" and my_log2() is int32. So, Min(lbuckets, max_pointers) chooses 0x80000000, then it was set on the nbuckets and triggers the Assert(). Attached patch fixes the problem. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
pgsql-hackers by date: