Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop - Mailing list pgsql-bugs
From | Tomas Vondra |
---|---|
Subject | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop |
Date | |
Msg-id | c15378b3-d4aa-a5c0-5538-dd52904dcbc6@2ndquadrant.com Whole thread Raw |
In response to | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop ("Todd A. Cook" <tcook@blackducksoftware.com>) |
Responses |
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
|
List | pgsql-bugs |
On 12/05/2017 11:33 PM, Todd A. Cook wrote: > On 12/05/17 17:01, Tomas Vondra wrote: >> >> >> On 12/05/2017 10:23 PM, Todd A. Cook wrote: >>> On 11/27/17 23:03, Tom Lane wrote: >>>> Thomas Munro <thomas.munro@enterprisedb.com> writes: >>>>> When SH_INSERT tries to insert that final extra value, insertdist >>>>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we >>>>> double the size (at least until my computer gives up, somewhere around >>>>> 11 doublings and 75GB of virtual memory). If you set SH_GROW_MAX_DIB >>>>> to 26 then it succeeds, but I guess some other attack could be crafted >>>>> for that. What is the theory behind this parameter? >>>> >>>> You beat me to it --- after looking at simplehash.h I'd guessed that >>>> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing >>>> an infinite loop, but I'd not gotten to determining which one yet. >>>> >>>> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well. Neither >>>> of them are obviously loop-proof. >>>> >>>> Note that the sample data has a lot of collisions: >>>> >>>> regression=# select hashint8(val), count(*) from reproducer group by 1 >>>> order by 2 desc; >>>> hashint8 | count >>>> -------------+------- >>>> 441526644 | 2337 >>>> -1117776826 | 1221 >>>> -1202007016 | 935 >>>> -2068831050 | 620 >>>> 1156644653 | 538 >>>> 553783815 | 510 >>>> 259780770 | 444 >>>> 371047036 | 394 >>>> 915722575 | 359 >>>> ... etc etc ... >>>> >>>> It's evidently more complicated than just that the code fails with >>>> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not >>>> by a lot. There needs to be a safety valve that prevents letting >>>> the hash fill factor approach zero, which is what's happening in >>>> this test case. >>> >>> FWIW, I can also reproduce the infinite loop with 167834 unique values. >>> >> >> Unique values or unique *hash* values? > > Unique values. > > >> Can you share the data, so that whoever fixes the bug can verify it also >> fixes your example? > > Sure. It's attached. > Seems the dataset has pretty much the same issue as the one reported before, that is select hashint8(val), count(distinct val), count(*) from temp_f_03 group by 1 order by 2 desc; hashint8 | count | count -------------+-------+------- -1971396144 | 45 | 45 2035896403 | 42 | 42 -1633843397 | 30 | 30 1425704662 | 29 | 29 -455482779 | 22 | 22 -300163565 | 17 | 17 -1803963420 | 17 | 17 -537082846 | 14 | 14 603707034 | 13 | 13 -176369887 | 12 | 12 1274957136 | 11 | 11 1465522632 | 11 | 11 -1589862230 | 10 | 10 -1145403239 | 10 | 10 i.e. there are many hash collisions (more than in the other data set). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-bugs by date: