Re: Change GUC hashtable to use simplehash? - Mailing list pgsql-hackers
From | jian he |
---|---|
Subject | Re: Change GUC hashtable to use simplehash? |
Date | |
Msg-id | CACJufxG6zR-t3dKoXfb_gqnHLH_CL2Pt3MPOOH2Hwx4t3e-eog@mail.gmail.com Whole thread Raw |
In response to | Re: Change GUC hashtable to use simplehash? (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: Change GUC hashtable to use simplehash?
|
List | pgsql-hackers |
On Tue, Dec 26, 2023 at 4:01 PM John Naylor <johncnaylorls@gmail.com> wrote: > > 0001-0003 are same as earlier > 0004 takes Jeff's idea and adds in an optimization from NetBSD's > strlen (I said OpenBSD earlier, but it goes back further). I added > stub code to simulate big-endian when requested at compile time, but a > later patch removes it. Since it benched well, I made the extra effort > to generalize it for other callers. After adding to the hash state, it > returns the length so the caller can pass it to the finalizer. > 0005 is the benchmark (not for commit) -- I took the parser keyword > list and added enough padding to make every string aligned when the > whole thing is copied to an alloc'd area. > > Each of the bench_*.sql files named below are just running the > similarly-named function, all with the same argument, e.g. "select * > from bench_pgstat_hash_fh(100000);", so not attached. > > Strings: > > -- strlen + hash_bytes > pgbench -n -T 20 -f bench_hash_bytes.sql -M prepared | grep latency > latency average = 1036.732 ms > > -- word-at-a-time hashing, with bytewise lookahead > pgbench -n -T 20 -f bench_cstr_unaligned.sql -M prepared | grep latency > latency average = 664.632 ms > > -- word-at-a-time for both hashing and lookahead (Jeff's aligned > coding plus a technique from NetBSD strlen) > pgbench -n -T 20 -f bench_cstr_aligned.sql -M prepared | grep latency > latency average = 436.701 ms > > So, the fully optimized aligned case is worth it if it's convenient. > > 0006 adds a byteswap for big-endian so we can reuse little endian > coding for the lookahead. > > 0007 - I also wanted to put numbers to 0003 (pgstat hash). While the > motivation for that was cleanup, I had a hunch it would shave cycles > and take up less binary space. It does on both accounts: > > -- 3x murmur + hash_combine > pgbench -n -T 20 -f bench_pgstat_orig.sql -M prepared | grep latency > latency average = 333.540 ms > > -- fasthash32 (simple call, no state setup and final needed for a single value) > pgbench -n -T 20 -f bench_pgstat_fh.sql -M prepared | grep latency > latency average = 277.591 ms > > 0008 - We can optimize the tail load when it's 4 bytes -- to save > loads, shifts, and OR's. My compiler can't figure this out for the > pgstat hash, with its fixed 4-byte tail. It's pretty simple and should > help other cases: > > pgbench -n -T 20 -f bench_pgstat_fh.sql -M prepared | grep latency > latency average = 226.113 ms --- /dev/null +++ b/contrib/bench_hash/bench_hash.c @@ -0,0 +1,103 @@ +/*------------------------------------------------------------------------- + * + * bench_hash.c + * + * Copyright (c) 2023, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/test/modules/bench_hash/bench_hash.c + * + *------------------------------------------------------------------------- + */ you added this module to contrib module (root/contrib), your intention (i guess) is to add in root/src/test/modules. later I saw "0005 is the benchmark (not for commit)". --- /dev/null +++ b/src/include/common/hashfn_unstable.h @@ -0,0 +1,213 @@ +/* +Building blocks for creating fast inlineable hash functions. The +unstable designation is in contrast to hashfn.h, which cannot break +compatibility because hashes can be writen to disk and so must produce +the same hashes between versions. + + * + * Portions Copyright (c) 2018-2023, PostgreSQL Global Development Group + * + * src/include/common/hashfn_unstable.c + */ + here should be "src/include/common/hashfn_unstable.h". typo: `writen` In pgbench, I use --no-vacuum --time=20 -M prepared My local computer is slow. but here is the test results: select * from bench_cstring_hash_aligned(100000); 7318.893 ms select * from bench_cstring_hash_unaligned(100000); 10383.173 ms select * from bench_pgstat_hash(100000); 4474.989 ms select * from bench_pgstat_hash_fh(100000); 9192.245 ms select * from bench_string_hash(100000); 2048.008 ms
pgsql-hackers by date: