Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop - Mailing list pgsql-bugs
From | Tomas Vondra |
---|---|
Subject | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop |
Date | |
Msg-id | 2a76784c-06cb-46ec-35be-9895d144b839@2ndquadrant.com Whole thread Raw |
In response to | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop ("Todd A. Cook" <tcook@blackducksoftware.com>) |
Responses |
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop |
List | pgsql-bugs |
On 12/05/2017 10:23 PM, Todd A. Cook wrote: > On 11/27/17 23:03, Tom Lane wrote: >> Thomas Munro <thomas.munro@enterprisedb.com> writes: >>> When SH_INSERT tries to insert that final extra value, insertdist >>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we >>> double the size (at least until my computer gives up, somewhere around >>> 11 doublings and 75GB of virtual memory). If you set SH_GROW_MAX_DIB >>> to 26 then it succeeds, but I guess some other attack could be crafted >>> for that. What is the theory behind this parameter? >> >> You beat me to it --- after looking at simplehash.h I'd guessed that >> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing >> an infinite loop, but I'd not gotten to determining which one yet. >> >> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well. Neither >> of them are obviously loop-proof. >> >> Note that the sample data has a lot of collisions: >> >> regression=# select hashint8(val), count(*) from reproducer group by 1 >> order by 2 desc; >> hashint8 | count >> -------------+------- >> 441526644 | 2337 >> -1117776826 | 1221 >> -1202007016 | 935 >> -2068831050 | 620 >> 1156644653 | 538 >> 553783815 | 510 >> 259780770 | 444 >> 371047036 | 394 >> 915722575 | 359 >> ... etc etc ... >> >> It's evidently more complicated than just that the code fails with >> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not >> by a lot. There needs to be a safety valve that prevents letting >> the hash fill factor approach zero, which is what's happening in >> this test case. > > FWIW, I can also reproduce the infinite loop with 167834 unique values. > Unique values or unique *hash* values? Can you share the data, so that whoever fixes the bug can verify it also fixes your example? > It kinda looks like the problematic collisions arise from masking the > computed hash values; e.g.: > > SH_INITIAL_BUCKET(SH_TYPE * tb, uint32 hash) > { > return hash & tb->sizemask; > } > That's pretty standard way to do modulo (when computing index of the hash bucket), and certainly not the root cause. > (Also FWIW, changing SH_FILLFACTOR to 0.5 (from 0.9) did not help any.) > Based on the initial discussion, the problem here is twofold. Firstly, the code assumes that if it gets certain number of bucket collisions (essentially, the initial bucket and certain number of neighboring buckets already being full), making the table larger is guaranteed to reduce the number of collisions. Which is obviously untrue for duplicate hash values, but it may also happen for distinct hash values that form a chain (think a sequence of hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed). This causes the insane growth to the largest possible hash table. Secondly, there's a bug that for the largest hash table we set sizemask=0, which means we never ever get bucket index other than 0. This causes the infinite loop. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-bugs by date: