Home > mailing lists

Re: [PATCHES] updated hash functions for postgresql v1 - Mailing list pgsql-hackers

From	Oleg Bartunov
Subject	Re: [PATCHES] updated hash functions for postgresql v1
Date	November 4, 2008 16:32:51
Msg-id	Pine.LNX.4.64.0811042330110.15810@sn.sai.msu.ru Whole thread Raw
In response to	Re: [PATCHES] updated hash functions for postgresql v1 (Kenneth Marshall <ktm@rice.edu>)
Responses	Re: [PATCHES] updated hash functions for postgresql v1
List	pgsql-hackers

Tree view

Just interested if you repeat your tests not with cracklib-dict,
but using 8-bit words. From our experience we found many hash functions
are optimized for 7-bit words and produce too many collisions
for 8-bit words. That's why we use crc32.

Oleg
On Tue, 4 Nov 2008, Kenneth Marshall wrote:

> Sorry about the delay for this update to the new hash
> index implementation. I was trying to get the WAL logging
> in place and forgot to post the actual patch. The WAL
> for hash indexes will need to wait for 8.5, but I did
> want to add back in the piece of the Bob Jenkins 2006
> hash function that was stripped out of the initial
> patch on application due to concerns about the randomness
> of the resulting hash values. Here is a re-post of my
> initial findings comparing the old/new Jenkins hash
> from lookup2 and lookup3. I have added a third column
> containing the results for the hash_any() resulting
> from the attached patch as well as simple timing test
> for a DB reindex both before and after patching.
>
> Also attached is a simple documentation patch updating
> the note attached to the hash index description.
>
> Regards,
> Ken
> ----------------------------------------------------
> Hi,
>
> I have finally had a chance to do some investigation on
> the performance of the old hash mix() function versus
> the updated mix()/final() in the new hash function. Here
> is a table of my current results for both the old and the
> new hash function. In this case cracklib refers to the
> cracklib-dict containing 1648379 unique words massaged
> in various ways to generate input strings for the hash
> functions. The result is the number of collisions in the
> hash values generated.
>
> hash input                            old    new  newv2
> ----------                            ---    ---  -----
> cracklib                              338    316  338
> cracklib x 2 (i.e. clibclib)          305    319  300
> cracklib x 3 (clibclibclib)           323    329  315
> cracklib x 10                         302    310  329
> cracklib x 100                        350    335  298
> cracklib x 1000                       314    309  315
> cracklib x 100 truncated to char(100) 311    327  320
>
> uint32 from 1-1648379                 309    319  347
> (uint32 1-1948379)*256                309    314  304
> (uint32 1-1948379)*16                 310    314  324
> "a"uint32 (i.e. a00001,a0002...)      320    321  312
>
> uint32uint32 (i.e. uint64)            321    287  309
>
> The different result columns are old = Jenkins 1996 hash
> function(lookup2.c), new = Jenkins 2006 hash function
> (lookup3.c), and newv2 = adaptation of current hash_any()
> to incorporate the separate mix()/final() functions. As
> you can see from the results, spliting the mix() and final()
> apart does not result in any perceptible loss of randomness
> in the hash assignment. I also ran a crude timing for a
> reindex of the following database:
>
> CREATE TABLE dict (word text);
> CREATE INDEX wordhash ON dict USING hash (word);
> INSERT INTO dict (word) VALUES('s;lhkjdpyoijxfg;lktjgh;sdlhkjo');
> INSERT INTO dict (SELECT MAX(word)||MAX(word) FROM dict);
> ... (21 times)
>
> REINDEX TABLE
> ...
>
> The average time to reindex the table using our current
> hash_any() without the separate mix()/final() was 1696ms
> and 1482ms with the separate mix()/final() stages giving
> almost 13% better performance for this stupid metric.
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-hackers by date:

From: Kenneth Marshall
Date: 04 November 2008, 16:26:59
Subject: Re: [PATCHES] updated hash functions for postgresql v1

From: Ron Mayer
Date: 04 November 2008, 16:34:39
Subject: Re: Patch for SQL-Standard Interval output and decoupling DateStyle from IntervalStyle

Re: [PATCHES] updated hash functions for postgresql v1 - Mailing list pgsql-hackers

Previous

Next