Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop - Mailing list pgsql-bugs
From | Tomas Vondra |
---|---|
Subject | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop |
Date | |
Msg-id | 7de4b2d6-b119-7d15-04d8-4c0fcb29cbc3@2ndquadrant.com Whole thread Raw |
In response to | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop (Andres Freund <andres@anarazel.de>) |
Responses |
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop |
List | pgsql-bugs |
On 12/06/2017 06:19 PM, Andres Freund wrote: > On 2017-12-06 12:14:24 -0500, Tom Lane wrote: >> Checking out hashint8() on random data shows no such obvious fault. >> You've apparently got a data set that exposes a weakness in hashint8, >> but it's not very clear what that is. > > It's intentionally designed to cause problems afaict: > > http://archives.postgresql.org/message-id/861b9f1f-cdc0-bc49-2595-80bc39c37dc3%40blackducksoftware.com > > >> In any case, the hashtable code needs to not fall over in the >> presence of a lot of collisions, regardless of the exact reason >> for there being a lot. > > Yes, we need to be more resilient about it. Working on a patch. > I've been playing with this a bit today (sorry for trespassing on your turf a bit, but curiosity killed the cat), particularly with the tree basic ideas mentioned in this thread. Let me share some numbers. 1) randomized hashing, i.e. using the extended hash functions and a random seed 2) universal hashing [1], i.e. passing the hash through ((a*h+b) mod p) formula, where h is "our" hash, and (a,b) are random and p is a prime. This can't possibly fix the collisions reported by Todd, because the result will remain the same. But it should fix "my" chains of values by spreading them a bit (and the a,b,p values are generated on each grow, so in the unlikely case where we get a chain, it should disappear after a growth) [1] https://en.wikipedia.org/wiki/Universal_hashing 3) disabling growth of the hash table when the fillfactor would drop too much (e.g. below 0.2) Very rough patches for these three approaches are attached. Now, some very rough numbers comparing the impact of those approaches (and their combinations). The following two tables present timings for for a couple of datasets - five cases without any collisions (allowing us to quantify impact on regular datasets), the two datasets generated by me (one without collisions, one with collisions), and the two issues reported by Todd. RH - randomized hashing (seed + extended) UH - universal hashing ST - stop growing master RH UH ST RH UH ST ------------------------------------------------------- Ok/1 542 540 552 537 0% 2% -1% Ok/2 232 221 242 237 -5% 4% 2% Ok/3 161 167 180 161 4% 12% 0% Ok/4 276 270 285 274 -2% 3% -1% Ok/5 174 179 193 174 3% 11% 0% Tomas/1 543 555 551 533 2% 1% -2% Tomas/2 OOM 2322 3251 - Todd/1 OOM OOM 63 63 Todd/2 OOM OOM OOM 72 This is mostly as expected, although I'm surprised that the universal hashing seems to fix Todd's first reproducer. The "stop growing" approach technically fixes "my" dataset, but it takes so much time I haven't timed it. In terms of slowdown (compared to master), I'd say RH and ST are mostly neutral. Universal hashing has significant impact in some cases, but I suppose that might be fixed by using a slightly different scheme (instead of using simple modulo arithmetic). Now, let's see on combinations of the approaches (broken into two tables, so that it doesn't get mangled by mail clients): master RH+UH RH+ST UH+ST RH+UH+ST Ok/1 542 553 545 549 558 Ok/2 232 241 234 241 250 Ok/3 161 180 170 180 187 Ok/4 276 274 270 285 290 Ok/5 174 187 182 193 198 Tomas/1 543 571 561 554 570 Tomas/2 - 2380 2340 3130 2380 Todd/1 - 61 65 63 64 Todd/2 - - 90 91 93 RH+UH RH+ST UH+ST RH+UH+ST Ok/1 2% 1% 1% 3% Ok/2 4% 1% 4% 8% Ok/3 12% 6% 12% 16% Ok/4 -1% -2% 3% 5% Ok/5 7% 5% 11% 14% Tomas/1 5% 3% 2% 5% It seems only the "stop growth" has to be part of the solution, as it's included in any combination fixing all cases shared in this thread. I'd say (randomized hashing + stop growing) seems like the best option. FWIW I do agree the data sets shared in this thread are pretty extreme and it doesn't make much sense to slow the regular cases. I'll be perfectly happy if we stop the OOM, making those cases fast is a bonus. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-bugs by date: