Re: [PATCHES] updated hash functions for postgresql v1 - Mailing list pgsql-hackers
From | Kenneth Marshall |
---|---|
Subject | Re: [PATCHES] updated hash functions for postgresql v1 |
Date | |
Msg-id | 20090127162048.GC1961@it.is.rice.edu Whole thread Raw |
In response to | Re: [PATCHES] updated hash functions for postgresql v1 (Kenneth Marshall <ktm@rice.edu>) |
Responses |
Re: [PATCHES] updated hash functions for postgresql v1
|
List | pgsql-hackers |
On Sun, Jan 25, 2009 at 10:27:03PM -0600, Kenneth Marshall wrote: > On Sat, Jan 10, 2009 at 01:36:25PM -0500, Tom Lane wrote: > > Jeff Davis <pgsql@j-davis.com> writes: > > > I ran 5 times on both old and new code, eliminating the top and bottom > > > and taking the average of the remaining 3, and I got a 6.9% performance > > > improvement with the new code. > > > > The question that has been carefully evaded throughout the discussion > > of this patch is whether the randomness of the hash result is decreased, > > and if so what is that likely to cost us in performance of the various > > hash-dependent algorithms. I would still like to have an answer to that > > before we make a change to gain marginal performance improvement in > > the hash function itself (which is already shown to be barely measurable > > in the total context of a hash-dependent operation...) > > > > regards, tom lane > > > > Dear Hackers and Reviewers, > > In response to Tom's questioning the randomness of the hash_any > when the mixing functions are split into two, the first used when > sequentially processing the input and the second for the final > mix, I have generated a more detailed analysis of the two hash_any > functions. > > First, one change to the 11/2008 patch, the keylen is added to a, b and > c initially so we do not need to add it later on. The is the > micro-diff: > ----------------------------- > > --- hashfunc.c_TWOMIX 2009-01-22 14:07:34.000000000 -0600 > +++ hashfunc.c_TWOMIX2 2009-01-22 14:17:32.000000000 -0600 > @@ -336,7 +336,6 @@ > > /* handle the last 11 bytes */ > k = (const unsigned char *) ka; > - c += keylen; > #ifdef WORDS_BIGENDIAN > switch (len) > { > @@ -439,7 +438,6 @@ > } > > /* handle the last 11 bytes */ > - c += keylen; > #ifdef WORDS_BIGENDIAN > switch (len) /* all the case statements fall through */ > > ----------------------------- > > The remainder of this document will use the runs from my initial results > broken out using various power-of-two bucket sizes to simulate our actual > use in PostgreSQL as the number of items hashed increases and we use more > and more bits of our hash to identify the appropriate bucket. I have run > each test twice, once with our current hash_any() with the single mix() > function and then a second time using my patch from the November commitfest > plus the patch above to produce a new hash_any() with two separate mixing > functions mix() and final(). For each run I have generated a sequence of > unique inputs, up to the number of buckets, hashed them with the hash > functions (both old and new), then I calculate the expected number of > collision p(n) using the poisson formula for each number of buckets, > where the number of buckets are 2**16, 2**18, 2**20, 2**22, 2**24, and > 2**26. For my initial run, I used a string consisting of the letter 'a' > followed by the integer representation of the numbers from 0 to the > (number of buckets - 1): > > 1) "a"uint32 ((i.e. a00001,a0002...) > Number of buckets: 65536 > Total number of items: 65536 > Expected number with n items: 24109 24109 12054 4018 1004 200 33 4 > Actual number mix(): 24044 24172 12078 4036 980 186 30 10 > Actual number mix()/final(): 24027 24232 12060 3972 1001 207 31 5 1 > > Number of buckets: 262144 > Total number of items: 262144 > Expected number with n items: 96437 96437 48218 16072 4018 803 133 19 2 > Actual number mix(): 96224 96730 48240 15951 4094 744 143 17 1 > Actual number mix()/final(): 96335 96646 48071 16128 4018 801 122 21 2 > > Number of buckets: 1048576 > Total number of items: 1048576 > Expected number with n items: 385749 385749 192874 64291 16072 3214 535 76 9 > Actual number mix(): 385716 385596 193243 64115 16053 3285 478 77 12 1 > Actual number mix()/final(): 385955 385016 193789 63768 16259 3190 511 79 8 1 > > Number of buckets: 4194304 > Total number of items: 4194304 > Expected number with n items: 1542998 1542998 771499 257166 64291 12858 2143 306 38 > Actual number mix(): 1542536 1543489 771351 257777 63830 12847 2123 326 19 5 1 > Actual number mix()/final(): 1541828 1544429 772072 256178 64579 12774 2129 288 22 5 > > Number of buckets: 16777216 > Total number of items: 16777216 > Expected number with n items: 6171992 6171992 3085996 1028665 257166 51433 8572 1224 153 > Actual number mix(): 6170866 6174079 3085912 1027140 257808 51385 8638 1219 146 23 > Actual number mix()/final(): 6172058 6171991 3086279 1027916 257535 51465 8554 1243 149 23 3 > > Number of buckets: 67108864 > Total number of items: 67108864 > Expected number with n items: 24687971 24687971 12343985 4114661 1028665 205733 34288 4898 612 > Actual number mix(): 24686110 24690897 12344232 4113515 1028232 205682 34546 4942 629 72 7 > Actual number mix()/final(): 24708515 24669248 12333034 4114796 1034256 208424 34888 5023 598 77 5 > > Here is a second run with number of items = (number of buckets)/2: > Number of buckets: 65536 > Total number of items: 32768 > Expected number with n items: 39749 19874 4968 828 103 10 > Actual number mix(): 39704 19978 4898 842 103 10 1 > Actual number mix()/final(): 39715 19922 4991 779 119 9 1 > > Number of buckets: 262144 > Total number of items: 131072 > Expected number with n items: 158998 79499 19874 3312 414 41 3 > Actual number mix(): 158753 79972 19703 3216 458 38 4 > Actual number mix()/final(): 158869 79688 19853 3303 393 33 3 2 > > Number of buckets: 1048576 > Total number of items: 524288 > Expected number with n items: 635993 317996 79499 13249 1656 165 13 > Actual number mix(): 636118 317839 79388 13435 1628 152 16 > Actual number mix()/final(): 636102 317824 79527 13267 1683 161 12 > > Number of buckets: 4194304 > Total number of items: 2097152 > Expected number with n items: 2543973 1271986 317996 52999 6624 662 55 3 > Actual number mix(): 2544529 1270639 318904 53005 6516 645 61 5 > Actual number mix()/final(): 2544014 1271483 318720 52849 6559 630 47 2 > > Number of buckets: 16777216 > Total number of items: 8388608 > Expected number with n items: 10175895 5087947 1271986 211997 26499 2649 220 15 > Actual number mix(): 10174997 5089736 1271355 211528 26679 2683 220 17 1 > Actual number mix()/final(): 10176567 5087207 1271789 212014 26684 2703 235 16 1 > > Number of buckets: 67108864 > Total number of items: 33554432 > Expected number with n items: 40703583 20351791 5087947 847991 105998 10599 883 63 3 > Actual number mix(): 40703033 20353345 5086252 848860 105885 10536 891 59 3 > Actual number mix()/final(): 40770874 20250828 5096890 865448 111835 11890 1013 76 10 > > 2) uint32uint32 (i.e. uint64) > Number of buckets: 65536 > Total number of items: 32768 > Expected number with n items: 39749 19874 4968 828 103 10 > Actual number mix(): 39770 19863 4947 822 125 9 > Actual number mix()/final(): 39754 19852 4990 837 91 11 1 > > Number of buckets: 262144 > Total number of items: 131072 > Expected number with n items: 158998 79499 19874 3312 414 41 3 > Actual number mix(): 159046 79464 19823 3334 430 43 3 1 > Actual number mix()/final(): 158879 79732 19774 3301 406 47 5 > > Number of buckets: 1048576 > Total number of items: 524288 > Expected number with n items: 635993 317996 79499 13249 1656 165 13 > Actual number mix(): 636275 317580 79514 13357 1661 172 14 3 > Actual number mix()/final(): 635578 318574 79504 13162 1585 159 13 1 > > Number of buckets: 4194304 > Total number of items: 2097152 > Expected number with n items: 2543973 1271986 317996 52999 6624 662 55 3 > Actual number mix(): 2543769 1272196 318067 53050 6497 672 48 4 1 > Actual number mix()/final(): 2543014 1273485 317812 52703 6589 635 59 7 > > Number of buckets: 16777216 > Total number of items: 8388608 > Expected number with n items: 10175895 5087947 1271986 211997 26499 2649 220 15 > Actual number mix(): 10174587 5090467 1270909 211836 26513 2679 207 18 > Actual number mix()/final(): 10175142 5089481 1270919 212555 26234 2643 224 15 3 > > Number of buckets: 67108864 > Total number of items: 33554432 > Expected number with n items: 40703583 20351791 5087947 847991 105998 10599 883 63 3 > Actual number mix(): 40706281 20347746 5088639 848133 106388 10669 946 60 2 > Actual number mix()/final(): 40701438 20354677 5088549 846570 106157 10578 838 55 2 > > 3) uint32 from 1-1648379 > Number of buckets: 65536 > Total number of items: 32768 > Expected number with n items: 39749 19874 4968 828 103 10 > Actual number mix(): 39758 19844 4993 835 97 9 > Actual number mix()/final(): 39842 19712 5016 849 109 7 1 > > Number of buckets: 262144 > Total number of items: 131072 > Expected number with n items: 158998 79499 19874 3312 414 41 3 > Actual number mix(): 158973 79523 19900 3283 426 38 1 > Actual number mix()/final(): 159136 79293 19896 3346 421 48 3 1 > > Number of buckets: 1048576 > Total number of items: 524288 > Expected number with n items: 635993 317996 79499 13249 1656 165 13 > Actual number mix(): 636083 317917 79484 13183 1711 180 16 2 > Actual number mix()/final(): 636183 317772 79438 13305 1685 173 20 > > Number of buckets: 4194304 > Total number of items: 2097152 > Expected number with n items: 2543973 1271986 317996 52999 6624 662 55 3 > Actual number mix(): 2544325 1271367 318260 52950 6660 683 54 4 1 > Actual number mix()/final(): 2544022 1272059 317778 53060 6646 665 70 4 > > Number of buckets: 16777216 > Total number of items: 8388608 > Expected number with n items: 10175895 5087947 1271986 211997 26499 2649 220 15 > Actual number mix(): 10176360 5087249 1272083 212010 26655 2629 212 18 > Actual number mix()/final(): 10175315 5088617 1272068 212144 26224 2589 234 23 1 1 > > Number of buckets: 67108864 > Total number of items: 33554432 > Expected number with n items: 40703583 20351791 5087947 847991 105998 10599 883 63 3 > Actual number mix(): 40704034 20350542 5088760 848309 105721 10520 894 77 7 > Actual number mix()/final(): 40774785 20243786 5098862 866801 111865 11641 1019 100 5 > > 4) cracklib > Number of buckets: 65536 > Total number of items: 32767 > Expected number with n items: 39750 19874 4968 828 103 10 > Actual number mix(): 39731 19858 5049 794 92 11 1 > Actual number mix()/final(): 39785 19801 5011 823 106 9 1 > > Number of buckets: 262144 > Total number of items: 131071 > Expected number with n items: 158998 79498 19874 3312 414 41 3 > Actual number mix(): 159077 79420 19806 3380 409 49 3 > Actual number mix()/final(): 159020 79475 19845 3366 383 54 1 > > Number of buckets: 1048576 > Total number of items: 524287 > Expected number with n items: 635994 317996 79498 13249 1656 165 13 > Actual number mix(): 635979 318066 79420 13281 1641 163 23 3 > Actual number mix()/final(): 635694 318637 79122 13271 1686 150 13 3 > > Number of buckets: 4194304 > Total number of items: 1648379 > Expected number with n items: 2831263 1112698 218647 28643 2814 221 14 > Actual number mix(): 2830769 1113458 218633 28396 2786 250 11 1 > Actual number mix()/final(): 2831304 1113004 218002 28843 2926 212 13 > > Number of buckets: 16777216 > Total number of items: 1648379 > Expected number with n items: 15207226 1494125 73399 2403 59 1 > Actual number mix(): 15206923 1494718 73129 2384 59 3 > Actual number mix()/final(): 15207183 1494267 73250 2452 64 > > Number of buckets: 67108864 > Total number of items: 1648379 > Expected number with n items: 65480564 1608383 19753 161 1.47989202515749e-08 > Actual number mix(): 65480389 1608727 19593 154 1 > Actual number mix()/final(): 65480455 1608609 19631 168 1 > > 5) cracklib x 100 > Number of buckets: 65536 > Total number of items: 32767 > Expected number with n items: 39750 19874 4968 828 103 10 > Actual number mix(): 39829 19742 5003 847 99 14 2 > Actual number mix()/final(): 39783 19794 5023 828 97 11 > > Number of buckets: 262144 > Total number of items: 131071 > Expected number with n items: 158998 79498 19874 3312 414 41 3 > Actual number mix(): 159136 79242 20007 3289 405 62 3 > Actual number mix()/final(): 158961 79546 19905 3270 408 51 3 > > Number of buckets: 1048576 > Total number of items: 524287 > Expected number with n items: 635994 317996 79498 13249 1656 165 13 > Actual number mix(): 635815 318331 79372 13218 1658 168 12 2 > Actual number mix()/final(): 635986 318152 79309 13231 1687 190 21 > > Number of buckets: 4194304 > Total number of items: 1648379 > Expected number with n items: 2831263 1112698 218647 28643 2814 221 14 > Actual number mix(): 2830891 1113468 218202 28712 2804 208 18 1 > Actual number mix()/final(): 2831786 1111616 219209 28661 2816 199 16 1 > > Number of buckets: 16777216 > Total number of items: 1648379 > Expected number with n items: 15207226 1494125 73399 2403 59 1 > Actual number mix(): 15207428 1493777 73492 2459 59 1 > Actual number mix()/final(): 15207777 1493063 73877 2435 63 1 > > Number of buckets: 67108864 > Total number of items: 1648379 > Expected number with n items: 65480564 1608383 19753 161 1.47989202515749e-08 > Actual number mix(): 65480463 1608595 19635 170 1 > Actual number mix()/final(): 65480650 1608222 19820 171 1 > > As you can see, all of the probabilities for both the current single mix() > function and the split mix()/final() functions match the theoretical results > as calculated by the poisson formula. Here is a nice reference that I found > online describing the testing procedure that I am using: > > http://rabbit.eng.miami.edu/class/een318/poisson.pdf > > I can find no situations where the current version of hash_any() performs > statistically better than the version resulting from my split mix()/final() > function patch from 11/2008. I would be happy to share the test harness with > anyone who wishes to test further. Please let me know if there is anything > I can do. I still recommend applying the split mixing function patch that > I submitted for the 11/2008 commitfest and I think these tests should allay > Tom's concerns about the randomness of the new hash function. > > Regards, > Ken Marshall > Dear Hackers, I have updated the patch posted by Jeff Davis on January 9th to include the micro-patch above as well as updated the polymorphism regressions tests. This applies cleanly to the latest CVS pull. Regards, Ken
Attachment
pgsql-hackers by date: