Thread: speedup tidbitmap patch: hash BlockNumber
Just replace tag_hash in tidbitmap which uses hash_any to direct call of hash_uint32, it saves ~5% of execution time. An example: # create extension btree_gin; # select (v / 10)::int4 as i into t from generate_series(1, 5000000) as v; # create index idx on t using gin (i); # set enable_seqscan = off; # explain analyze select * from t where i >= 0; without patch: Execution time: 2427.692 ms with patch: Execution time: 2319.376 ms # explain analyze select * from t where i >= 100 and i<= 100; without patch: Execution time: 524.441 ms with patch: Execution time: 504.478 ms -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
On 10/22/2014 04:14 PM, Teodor Sigaev wrote: > Just replace tag_hash in tidbitmap which uses hash_any to direct call of > hash_uint32, it saves ~5% of execution time. > > An example: > # create extension btree_gin; > # select (v / 10)::int4 as i into t from generate_series(1, 5000000) as v; > # create index idx on t using gin (i); > # set enable_seqscan = off; > > # explain analyze select * from t where i >= 0; > without patch: Execution time: 2427.692 ms > with patch: Execution time: 2319.376 ms > > # explain analyze select * from t where i >= 100 and i<= 100; > without patch: Execution time: 524.441 ms > with patch: Execution time: 504.478 ms Nice little speedup, for such a simple patch. On my laptop, "perf" confirms that with the patch, about 5% of the CPU time is spent in tag_blocknumber, and without it, about 8% is spent in tag_hash. That gives a 3% speed up, which is in the same ballpark that you saw. I'd suggest putting the new function in hashfn.c, where we already have similar functions, string_hash, tag_hash, oid_hash and bitmap_hash. And name it "blocknumber_hash", for consistency. - Heikki
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > On 10/22/2014 04:14 PM, Teodor Sigaev wrote: >> Just replace tag_hash in tidbitmap which uses hash_any to direct call of >> hash_uint32, it saves ~5% of execution time. > I'd suggest putting the new function in hashfn.c, where we already have > similar functions, string_hash, tag_hash, oid_hash and bitmap_hash. And > name it "blocknumber_hash", for consistency. I think this suggestion is misguided, and the patch itself needs rethinking. Instead of doing this, let's hack dynahash.c itself to substitute a shim like this when it's told function == tag_hash and keysize == sizeof(uint32). Then we can remove any similar shims that already exist, and possibly end up with a net savings of code rather than adding more. regards, tom lane
> I think this suggestion is misguided, and the patch itself needs > rethinking. Instead of doing this, let's hack dynahash.c itself > to substitute a shim like this when it's told function == tag_hash and > keysize == sizeof(uint32). Then we can remove any similar shims that > already exist, and possibly end up with a net savings of code rather than > adding more. done, actoually I found oid_hash shim only. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
On 17 December 2014 at 06:07, Teodor Sigaev <teodor@sigaev.ru> wrote:
I think this suggestion is misguided, and the patch itself needsdone, actoually I found oid_hash shim only.
rethinking. Instead of doing this, let's hack dynahash.c itself
to substitute a shim like this when it's told function == tag_hash and
keysize == sizeof(uint32). Then we can remove any similar shims that
already exist, and possibly end up with a net savings of code rather than
adding more.
- hash_ctl.hash = oid_hash; /* a bit more efficient than tag_hash */
+ hash_ctl.hash = tag_hash; /* a bit more efficient than tag_hash */
I think the comment may need removed here.
Regards
David Rowley
> -hash_ctl.hash = oid_hash; /* a bit more efficient than tag_hash */ > +hash_ctl.hash = tag_hash; /* a bit more efficient than tag_hash */ > > I think the comment may need removed here. Thank you, fixed -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
Teodor Sigaev <teodor@sigaev.ru> writes: >> -hash_ctl.hash = oid_hash; /* a bit more efficient than tag_hash */ >> +hash_ctl.hash = tag_hash; /* a bit more efficient than tag_hash */ >> >> I think the comment may need removed here. > Thank you, fixed I looked at this patch. It's not right at all here: + if (hashp->hash == sizeof(int32) && info->hash == tag_hash) + hashp->hash = int32_hash; + else + hashp->hash = info->hash; hashp->hash is not defined at this point, and if it were defined it would not be the size of the key, and replacing that with info->keysize wouldn't be exactly the right thing either since info->keysize isn't necessarily set (there is a default, though I suspect it's mostly unused). I hacked up a solution to that and was about to commit when I had second thoughts about the whole approach. The thing that's worrying me about this is that it depends on the assumption that the passed-in info->hash pointer is identical to what dynahash.c thinks is the address of tag_hash. I'm not convinced that that's necessarily true on all platforms; in particular, I'm worried that a call from a loadable module (such as plperl) might be passing the address of a trampoline or thunk function that is a jump to tag_hash and not tag_hash itself. I don't see this happening on my Linux box but it might well happen on, say, Windows. There are already some comparisons of the hash function pointer in dynahash.c, so you might think this concern is moot, but if you look closely they're just comparisons to the address of string_hash --- which, in normal use-cases, would be supplied by dynahash.c itself. So I don't think the existing coding proves this will work. Now, we could do it like this anyway, and just ignore the fact that we might not get the optimization on every platform for hash tables created by loadable modules. However, there's another way we could attack this, which is to invent a new hash option flag bit that says "pick a suitable hash function for me, assuming that all bits of the specified key size are significant". So instead of ctl.keysize = sizeof(...); ctl.entrysize = sizeof(...); ctl.hash = tag_hash; tab = hash_create("...", ..., &ctl, HASH_ELEM | HASH_FUNCTION); you'd write ctl.keysize = sizeof(...); ctl.entrysize = sizeof(...); tab = hash_create("...", ..., &ctl, HASH_ELEM| HASH_BLOBS); which would save some code space and is arguably cleaner than the current approach of specifying some support functions and not others. This would be more invasive than the current patch, since we'd have to run around and touch code that currently mentions tag_hash as well as the places that currently mention oid_hash. But the patch is already pretty invasive just doing the latter. Thoughts? Anyone have a decent suggestion for what to call the flag bit? I'm not enamored of "HASH_BLOBS", it's just the first thing that came to mind. Also, depending on how much we want to annoy third-party authors, we could consider making a clean break from the Old Way here and say say that callers must specify all or none of the hash support functions, thus letting us get rid of the very ad-hoc logic in hash_create() that fills in defaults for some functions based on what you said for others. So the rule would be something like: No flag mentioned: use functions suitable for null-terminated strings HASH_BLOBS: use functions suitable for arbitraryfixed-size tags HASH_FUNCTIONS: caller must supply all three support functions and we'd remove the existing flag bits HASH_FUNCTION, HASH_COMPARE, HASH_KEYCOPY. But this would just be in the line of making the API simpler/more logical, it wouldn't buy us performance as such. I'm not sure whether it's a good idea to go this far. Comments? regards, tom lane
I wrote: > However, there's another way we could attack this, > which is to invent a new hash option flag bit that says "pick a suitable > hash function for me, assuming that all bits of the specified key size are > significant". So instead of > ctl.keysize = sizeof(...); > ctl.entrysize = sizeof(...); > ctl.hash = tag_hash; > tab = hash_create("...", ..., &ctl, > HASH_ELEM | HASH_FUNCTION); > you'd write > ctl.keysize = sizeof(...); > ctl.entrysize = sizeof(...); > tab = hash_create("...", ..., &ctl, > HASH_ELEM | HASH_BLOBS); > which would save some code space and is arguably cleaner than the > current approach of specifying some support functions and not others. Here's a proposed patch along this line. I left in oid_hash (in the form of a macro) so that this does not cause any API break for existing third-party modules. However, no callers in our own code directly refer to tag_hash or oid_hash anymore. regards, tom lane diff --git a/contrib/pg_trgm/trgm_regexp.c b/contrib/pg_trgm/trgm_regexp.c index 9f05053..f7a73cb 100644 *** a/contrib/pg_trgm/trgm_regexp.c --- b/contrib/pg_trgm/trgm_regexp.c *************** transformGraph(TrgmNFA *trgmNFA) *** 915,925 **** hashCtl.keysize = sizeof(TrgmStateKey); hashCtl.entrysize = sizeof(TrgmState); hashCtl.hcxt = CurrentMemoryContext; - hashCtl.hash = tag_hash; trgmNFA->states = hash_create("Trigram NFA", 1024, &hashCtl, ! HASH_ELEM | HASH_CONTEXT | HASH_FUNCTION); /* Create initial state: ambiguous prefix, NFA's initial state */ MemSet(&initkey, 0, sizeof(initkey)); --- 915,924 ---- hashCtl.keysize = sizeof(TrgmStateKey); hashCtl.entrysize = sizeof(TrgmState); hashCtl.hcxt = CurrentMemoryContext; trgmNFA->states = hash_create("Trigram NFA", 1024, &hashCtl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); /* Create initial state: ambiguous prefix, NFA's initial state */ MemSet(&initkey, 0, sizeof(initkey)); diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 116be7d..61216e5 100644 *** a/contrib/postgres_fdw/connection.c --- b/contrib/postgres_fdw/connection.c *************** GetConnection(ForeignServer *server, Use *** 109,120 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(ConnCacheKey); ctl.entrysize = sizeof(ConnCacheEntry); - ctl.hash = tag_hash; /* allocate ConnectionHash in the cache context */ ctl.hcxt = CacheMemoryContext; ConnectionHash = hash_create("postgres_fdw connections", 8, &ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); /* * Register some callback functions that manage connection cleanup. --- 109,119 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(ConnCacheKey); ctl.entrysize = sizeof(ConnCacheEntry); /* allocate ConnectionHash in the cache context */ ctl.hcxt = CacheMemoryContext; ConnectionHash = hash_create("postgres_fdw connections", 8, &ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); /* * Register some callback functions that manage connection cleanup. diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c index 5acc986..09a9df4 100644 *** a/src/backend/access/gist/gistbuild.c --- b/src/backend/access/gist/gistbuild.c *************** gistInitParentMap(GISTBuildState *builds *** 1142,1153 **** hashCtl.keysize = sizeof(BlockNumber); hashCtl.entrysize = sizeof(ParentMapEntry); hashCtl.hcxt = CurrentMemoryContext; - hashCtl.hash = oid_hash; buildstate->parentMap = hash_create("gistbuild parent map", 1024, &hashCtl, ! HASH_ELEM | HASH_CONTEXT ! | HASH_FUNCTION); } static void --- 1142,1151 ---- hashCtl.keysize = sizeof(BlockNumber); hashCtl.entrysize = sizeof(ParentMapEntry); hashCtl.hcxt = CurrentMemoryContext; buildstate->parentMap = hash_create("gistbuild parent map", 1024, &hashCtl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); } static void diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c index 577ea61..4937c38 100644 *** a/src/backend/access/gist/gistbuildbuffers.c --- b/src/backend/access/gist/gistbuildbuffers.c *************** gistInitBuildBuffers(int pagesPerBuffer, *** 76,91 **** * nodeBuffersTab hash is association between index blocks and it's * buffers. */ hashCtl.keysize = sizeof(BlockNumber); hashCtl.entrysize = sizeof(GISTNodeBuffer); hashCtl.hcxt = CurrentMemoryContext; - hashCtl.hash = tag_hash; - hashCtl.match = memcmp; gfbb->nodeBuffersTab = hash_create("gistbuildbuffers", 1024, &hashCtl, ! HASH_ELEM | HASH_CONTEXT ! | HASH_FUNCTION | HASH_COMPARE); gfbb->bufferEmptyingQueue = NIL; --- 76,89 ---- * nodeBuffersTab hash is association between index blocks and it's * buffers. */ + memset(&hashCtl, 0, sizeof(hashCtl)); hashCtl.keysize = sizeof(BlockNumber); hashCtl.entrysize = sizeof(GISTNodeBuffer); hashCtl.hcxt = CurrentMemoryContext; gfbb->nodeBuffersTab = hash_create("gistbuildbuffers", 1024, &hashCtl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); gfbb->bufferEmptyingQueue = NIL; diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c index 4b132b7..9738865 100644 *** a/src/backend/access/heap/rewriteheap.c --- b/src/backend/access/heap/rewriteheap.c *************** begin_heap_rewrite(Relation old_heap, Re *** 283,295 **** hash_ctl.keysize = sizeof(TidHashKey); hash_ctl.entrysize = sizeof(UnresolvedTupData); hash_ctl.hcxt = state->rs_cxt; - hash_ctl.hash = tag_hash; state->rs_unresolved_tups = hash_create("Rewrite / Unresolved ctids", 128, /* arbitrary initial size */ &hash_ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); hash_ctl.entrysize = sizeof(OldToNewMappingData); --- 283,294 ---- hash_ctl.keysize = sizeof(TidHashKey); hash_ctl.entrysize = sizeof(UnresolvedTupData); hash_ctl.hcxt = state->rs_cxt; state->rs_unresolved_tups = hash_create("Rewrite / Unresolved ctids", 128, /* arbitrary initial size */ &hash_ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); hash_ctl.entrysize = sizeof(OldToNewMappingData); *************** begin_heap_rewrite(Relation old_heap, Re *** 297,303 **** hash_create("Rewrite / Old to new tid map", 128, /* arbitrary initial size */ &hash_ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); MemoryContextSwitchTo(old_cxt); --- 296,302 ---- hash_create("Rewrite / Old to new tid map", 128, /* arbitrary initial size */ &hash_ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); MemoryContextSwitchTo(old_cxt); *************** logical_begin_heap_rewrite(RewriteState *** 834,846 **** hash_ctl.keysize = sizeof(TransactionId); hash_ctl.entrysize = sizeof(RewriteMappingFile); hash_ctl.hcxt = state->rs_cxt; - hash_ctl.hash = tag_hash; state->rs_logical_mappings = hash_create("Logical rewrite mapping", 128, /* arbitrary initial size */ &hash_ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); } /* --- 833,844 ---- hash_ctl.keysize = sizeof(TransactionId); hash_ctl.entrysize = sizeof(RewriteMappingFile); hash_ctl.hcxt = state->rs_cxt; state->rs_logical_mappings = hash_create("Logical rewrite mapping", 128, /* arbitrary initial size */ &hash_ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); } /* diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index ae323a0..89b2ba5 100644 *** a/src/backend/access/transam/xlogutils.c --- b/src/backend/access/transam/xlogutils.c *************** log_invalid_page(RelFileNode node, ForkN *** 107,118 **** memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(xl_invalid_page_key); ctl.entrysize = sizeof(xl_invalid_page); - ctl.hash = tag_hash; invalid_page_tab = hash_create("XLOG invalid-page table", 100, &ctl, ! HASH_ELEM | HASH_FUNCTION); } /* we currently assume xl_invalid_page_key contains no padding */ --- 107,117 ---- memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(xl_invalid_page_key); ctl.entrysize = sizeof(xl_invalid_page); invalid_page_tab = hash_create("XLOG invalid-page table", 100, &ctl, ! HASH_ELEM | HASH_BLOBS); } /* we currently assume xl_invalid_page_key contains no padding */ diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c index ba5b938..811e1d4 100644 *** a/src/backend/commands/sequence.c --- b/src/backend/commands/sequence.c *************** create_seq_hashtable(void) *** 986,995 **** memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(SeqTableData); - ctl.hash = oid_hash; seqhashtab = hash_create("Sequence values", 16, &ctl, ! HASH_ELEM | HASH_FUNCTION); } /* --- 986,994 ---- memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(SeqTableData); seqhashtab = hash_create("Sequence values", 16, &ctl, ! HASH_ELEM | HASH_BLOBS); } /* diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c index a880c81..3b07438 100644 *** a/src/backend/nodes/tidbitmap.c --- b/src/backend/nodes/tidbitmap.c *************** tbm_create_pagetable(TIDBitmap *tbm) *** 221,232 **** MemSet(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(BlockNumber); hash_ctl.entrysize = sizeof(PagetableEntry); - hash_ctl.hash = tag_hash; hash_ctl.hcxt = tbm->mcxt; tbm->pagetable = hash_create("TIDBitmap", 128, /* start small and extend */ &hash_ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); /* If entry1 is valid, push it into the hashtable */ if (tbm->status == TBM_ONE_PAGE) --- 221,231 ---- MemSet(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(BlockNumber); hash_ctl.entrysize = sizeof(PagetableEntry); hash_ctl.hcxt = tbm->mcxt; tbm->pagetable = hash_create("TIDBitmap", 128, /* start small and extend */ &hash_ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); /* If entry1 is valid, push it into the hashtable */ if (tbm->status == TBM_ONE_PAGE) diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c index 9f9a208..6daa058 100644 *** a/src/backend/optimizer/util/predtest.c --- b/src/backend/optimizer/util/predtest.c *************** lookup_proof_cache(Oid pred_op, Oid clau *** 1711,1719 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(OprProofCacheKey); ctl.entrysize = sizeof(OprProofCacheEntry); - ctl.hash = tag_hash; OprProofCacheHash = hash_create("Btree proof lookup cache", 256, ! &ctl, HASH_ELEM | HASH_FUNCTION); /* Arrange to flush cache on pg_amop changes */ CacheRegisterSyscacheCallback(AMOPOPID, --- 1711,1718 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(OprProofCacheKey); ctl.entrysize = sizeof(OprProofCacheEntry); OprProofCacheHash = hash_create("Btree proof lookup cache", 256, ! &ctl, HASH_ELEM | HASH_BLOBS); /* Arrange to flush cache on pg_amop changes */ CacheRegisterSyscacheCallback(AMOPOPID, diff --git a/src/backend/parser/parse_oper.c b/src/backend/parser/parse_oper.c index b65b632..11a6397 100644 *** a/src/backend/parser/parse_oper.c --- b/src/backend/parser/parse_oper.c *************** find_oper_cache_entry(OprCacheKey *key) *** 1059,1067 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(OprCacheKey); ctl.entrysize = sizeof(OprCacheEntry); - ctl.hash = tag_hash; OprCacheHash = hash_create("Operator lookup cache", 256, ! &ctl, HASH_ELEM | HASH_FUNCTION); /* Arrange to flush cache on pg_operator and pg_cast changes */ CacheRegisterSyscacheCallback(OPERNAMENSP, --- 1059,1066 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(OprCacheKey); ctl.entrysize = sizeof(OprCacheEntry); OprCacheHash = hash_create("Operator lookup cache", 256, ! &ctl, HASH_ELEM | HASH_BLOBS); /* Arrange to flush cache on pg_operator and pg_cast changes */ CacheRegisterSyscacheCallback(OPERNAMENSP, diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 1d6e3f3..675f985 100644 *** a/src/backend/postmaster/autovacuum.c --- b/src/backend/postmaster/autovacuum.c *************** rebuild_database_list(Oid newdb) *** 922,931 **** */ hctl.keysize = sizeof(Oid); hctl.entrysize = sizeof(avl_dbase); - hctl.hash = oid_hash; hctl.hcxt = tmpcxt; dbhash = hash_create("db hash", 20, &hctl, /* magic number here FIXME */ ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); /* start by inserting the new database */ score = 0; --- 922,930 ---- */ hctl.keysize = sizeof(Oid); hctl.entrysize = sizeof(avl_dbase); hctl.hcxt = tmpcxt; dbhash = hash_create("db hash", 20, &hctl, /* magic number here FIXME */ ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); /* start by inserting the new database */ score = 0; *************** do_autovacuum(void) *** 1997,2008 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(av_relation); - ctl.hash = oid_hash; table_toast_map = hash_create("TOAST to main relid map", 100, &ctl, ! HASH_ELEM | HASH_FUNCTION); /* * Scan pg_class to determine which tables to vacuum. --- 1996,2006 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(av_relation); table_toast_map = hash_create("TOAST to main relid map", 100, &ctl, ! HASH_ELEM | HASH_BLOBS); /* * Scan pg_class to determine which tables to vacuum. diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 6c814ba..8a79d9b 100644 *** a/src/backend/postmaster/checkpointer.c --- b/src/backend/postmaster/checkpointer.c *************** CompactCheckpointerRequestQueue(void) *** 1212,1224 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(CheckpointerRequest); ctl.entrysize = sizeof(struct CheckpointerSlotMapping); - ctl.hash = tag_hash; ctl.hcxt = CurrentMemoryContext; htab = hash_create("CompactCheckpointerRequestQueue", CheckpointerShmem->num_requests, &ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); /* * The basic idea here is that a request can be skipped if it's followed --- 1212,1223 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(CheckpointerRequest); ctl.entrysize = sizeof(struct CheckpointerSlotMapping); ctl.hcxt = CurrentMemoryContext; htab = hash_create("CompactCheckpointerRequestQueue", CheckpointerShmem->num_requests, &ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); /* * The basic idea here is that a request can be skipped if it's followed diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index f71fdeb..938fc29 100644 *** a/src/backend/postmaster/pgstat.c --- b/src/backend/postmaster/pgstat.c *************** pgstat_collect_oids(Oid catalogid) *** 1132,1143 **** memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(Oid); - hash_ctl.hash = oid_hash; hash_ctl.hcxt = CurrentMemoryContext; htab = hash_create("Temporary table of OIDs", PGSTAT_TAB_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); rel = heap_open(catalogid, AccessShareLock); snapshot = RegisterSnapshot(GetLatestSnapshot()); --- 1132,1142 ---- memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(Oid); hash_ctl.hcxt = CurrentMemoryContext; htab = hash_create("Temporary table of OIDs", PGSTAT_TAB_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); rel = heap_open(catalogid, AccessShareLock); snapshot = RegisterSnapshot(GetLatestSnapshot()); *************** pgstat_init_function_usage(FunctionCallI *** 1520,1530 **** memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(PgStat_BackendFunctionEntry); - hash_ctl.hash = oid_hash; pgStatFunctions = hash_create("Function stat entries", PGSTAT_FUNCTION_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION); } /* Get the stats entry for this function, create if necessary */ --- 1519,1528 ---- memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(PgStat_BackendFunctionEntry); pgStatFunctions = hash_create("Function stat entries", PGSTAT_FUNCTION_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_BLOBS); } /* Get the stats entry for this function, create if necessary */ *************** reset_dbentry_counters(PgStat_StatDBEntr *** 3480,3498 **** memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(PgStat_StatTabEntry); - hash_ctl.hash = oid_hash; dbentry->tables = hash_create("Per-database table", PGSTAT_TAB_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(PgStat_StatFuncEntry); - hash_ctl.hash = oid_hash; dbentry->functions = hash_create("Per-database function", PGSTAT_FUNCTION_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION); } /* --- 3478,3494 ---- memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(PgStat_StatTabEntry); dbentry->tables = hash_create("Per-database table", PGSTAT_TAB_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_BLOBS); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(PgStat_StatFuncEntry); dbentry->functions = hash_create("Per-database function", PGSTAT_FUNCTION_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_BLOBS); } /* *************** pgstat_read_statsfiles(Oid onlydb, bool *** 3899,3908 **** memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(PgStat_StatDBEntry); - hash_ctl.hash = oid_hash; hash_ctl.hcxt = pgStatLocalContext; dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); /* * Clear out global and archiver statistics so they start from zero in --- 3895,3903 ---- memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(PgStat_StatDBEntry); hash_ctl.hcxt = pgStatLocalContext; dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); /* * Clear out global and archiver statistics so they start from zero in *************** pgstat_read_statsfiles(Oid onlydb, bool *** 4023,4043 **** memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(PgStat_StatTabEntry); - hash_ctl.hash = oid_hash; hash_ctl.hcxt = pgStatLocalContext; dbentry->tables = hash_create("Per-database table", PGSTAT_TAB_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(PgStat_StatFuncEntry); - hash_ctl.hash = oid_hash; hash_ctl.hcxt = pgStatLocalContext; dbentry->functions = hash_create("Per-database function", PGSTAT_FUNCTION_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); /* * If requested, read the data from the database-specific --- 4018,4036 ---- memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(PgStat_StatTabEntry); hash_ctl.hcxt = pgStatLocalContext; dbentry->tables = hash_create("Per-database table", PGSTAT_TAB_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(PgStat_StatFuncEntry); hash_ctl.hcxt = pgStatLocalContext; dbentry->functions = hash_create("Per-database function", PGSTAT_FUNCTION_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); /* * If requested, read the data from the database-specific diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index cd132c1..799ec84 100644 *** a/src/backend/replication/logical/reorderbuffer.c --- b/src/backend/replication/logical/reorderbuffer.c *************** ReorderBufferAllocate(void) *** 245,255 **** hash_ctl.keysize = sizeof(TransactionId); hash_ctl.entrysize = sizeof(ReorderBufferTXNByIdEnt); - hash_ctl.hash = tag_hash; hash_ctl.hcxt = buffer->context; buffer->by_txn = hash_create("ReorderBufferByXid", 1000, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); buffer->by_txn_last_xid = InvalidTransactionId; buffer->by_txn_last_txn = NULL; --- 245,254 ---- hash_ctl.keysize = sizeof(TransactionId); hash_ctl.entrysize = sizeof(ReorderBufferTXNByIdEnt); hash_ctl.hcxt = buffer->context; buffer->by_txn = hash_create("ReorderBufferByXid", 1000, &hash_ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); buffer->by_txn_last_xid = InvalidTransactionId; buffer->by_txn_last_txn = NULL; *************** ReorderBufferBuildTupleCidHash(ReorderBu *** 1111,1117 **** hash_ctl.keysize = sizeof(ReorderBufferTupleCidKey); hash_ctl.entrysize = sizeof(ReorderBufferTupleCidEnt); - hash_ctl.hash = tag_hash; hash_ctl.hcxt = rb->context; /* --- 1110,1115 ---- *************** ReorderBufferBuildTupleCidHash(ReorderBu *** 1120,1126 **** */ txn->tuplecid_hash = hash_create("ReorderBufferTupleCid", txn->ntuplecids, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); dlist_foreach(iter, &txn->tuplecids) { --- 1118,1124 ---- */ txn->tuplecid_hash = hash_create("ReorderBufferTupleCid", txn->ntuplecids, &hash_ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); dlist_foreach(iter, &txn->tuplecids) { *************** ReorderBufferToastInitHash(ReorderBuffer *** 2434,2443 **** memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(ReorderBufferToastEnt); - hash_ctl.hash = tag_hash; hash_ctl.hcxt = rb->context; txn->toast_hash = hash_create("ReorderBufferToastHash", 5, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); } /* --- 2432,2440 ---- memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(ReorderBufferToastEnt); hash_ctl.hcxt = rb->context; txn->toast_hash = hash_create("ReorderBufferToastHash", 5, &hash_ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); } /* diff --git a/src/backend/storage/buffer/buf_table.c b/src/backend/storage/buffer/buf_table.c index 7a38f2f..0750b63 100644 *** a/src/backend/storage/buffer/buf_table.c --- b/src/backend/storage/buffer/buf_table.c *************** InitBufTable(int size) *** 59,71 **** /* BufferTag maps to Buffer */ info.keysize = sizeof(BufferTag); info.entrysize = sizeof(BufferLookupEnt); - info.hash = tag_hash; info.num_partitions = NUM_BUFFER_PARTITIONS; SharedBufHash = ShmemInitHash("Shared Buffer Lookup Table", size, size, &info, ! HASH_ELEM | HASH_FUNCTION | HASH_PARTITION); } /* --- 59,70 ---- /* BufferTag maps to Buffer */ info.keysize = sizeof(BufferTag); info.entrysize = sizeof(BufferLookupEnt); info.num_partitions = NUM_BUFFER_PARTITIONS; SharedBufHash = ShmemInitHash("Shared Buffer Lookup Table", size, size, &info, ! HASH_ELEM | HASH_BLOBS | HASH_PARTITION); } /* diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 9b62aec..1b99e21 100644 *** a/src/backend/storage/buffer/bufmgr.c --- b/src/backend/storage/buffer/bufmgr.c *************** InitBufferPoolAccess(void) *** 2063,2072 **** MemSet(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(int32); hash_ctl.entrysize = sizeof(PrivateRefCountArray); - hash_ctl.hash = oid_hash; /* a bit more efficient than tag_hash */ PrivateRefCountHash = hash_create("PrivateRefCount", 100, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION); } /* --- 2063,2071 ---- MemSet(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(int32); hash_ctl.entrysize = sizeof(PrivateRefCountArray); PrivateRefCountHash = hash_create("PrivateRefCount", 100, &hash_ctl, ! HASH_ELEM | HASH_BLOBS); } /* diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c index 6c81be4..3768b02 100644 *** a/src/backend/storage/buffer/localbuf.c --- b/src/backend/storage/buffer/localbuf.c *************** InitLocalBuffers(void) *** 415,426 **** MemSet(&info, 0, sizeof(info)); info.keysize = sizeof(BufferTag); info.entrysize = sizeof(LocalBufferLookupEnt); - info.hash = tag_hash; LocalBufHash = hash_create("Local Buffer Lookup Table", nbufs, &info, ! HASH_ELEM | HASH_FUNCTION); if (!LocalBufHash) elog(ERROR, "could not initialize local buffer hash table"); --- 415,425 ---- MemSet(&info, 0, sizeof(info)); info.keysize = sizeof(BufferTag); info.entrysize = sizeof(LocalBufferLookupEnt); LocalBufHash = hash_create("Local Buffer Lookup Table", nbufs, &info, ! HASH_ELEM | HASH_BLOBS); if (!LocalBufHash) elog(ERROR, "could not initialize local buffer hash table"); diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index cbe9574..173c9db 100644 *** a/src/backend/storage/lmgr/lock.c --- b/src/backend/storage/lmgr/lock.c *************** void *** 373,379 **** InitLocks(void) { HASHCTL info; - int hash_flags; long init_table_size, max_table_size; bool found; --- 373,378 ---- *************** InitLocks(void) *** 392,406 **** MemSet(&info, 0, sizeof(info)); info.keysize = sizeof(LOCKTAG); info.entrysize = sizeof(LOCK); - info.hash = tag_hash; info.num_partitions = NUM_LOCK_PARTITIONS; - hash_flags = (HASH_ELEM | HASH_FUNCTION | HASH_PARTITION); LockMethodLockHash = ShmemInitHash("LOCK hash", init_table_size, max_table_size, &info, ! hash_flags); /* Assume an average of 2 holders per lock */ max_table_size *= 2; --- 391,403 ---- MemSet(&info, 0, sizeof(info)); info.keysize = sizeof(LOCKTAG); info.entrysize = sizeof(LOCK); info.num_partitions = NUM_LOCK_PARTITIONS; LockMethodLockHash = ShmemInitHash("LOCK hash", init_table_size, max_table_size, &info, ! HASH_ELEM | HASH_BLOBS | HASH_PARTITION); /* Assume an average of 2 holders per lock */ max_table_size *= 2; *************** InitLocks(void) *** 414,426 **** info.entrysize = sizeof(PROCLOCK); info.hash = proclock_hash; info.num_partitions = NUM_LOCK_PARTITIONS; - hash_flags = (HASH_ELEM | HASH_FUNCTION | HASH_PARTITION); LockMethodProcLockHash = ShmemInitHash("PROCLOCK hash", init_table_size, max_table_size, &info, ! hash_flags); /* * Allocate fast-path structures. --- 411,422 ---- info.entrysize = sizeof(PROCLOCK); info.hash = proclock_hash; info.num_partitions = NUM_LOCK_PARTITIONS; LockMethodProcLockHash = ShmemInitHash("PROCLOCK hash", init_table_size, max_table_size, &info, ! HASH_ELEM | HASH_FUNCTION | HASH_PARTITION); /* * Allocate fast-path structures. *************** InitLocks(void) *** 445,457 **** info.keysize = sizeof(LOCALLOCKTAG); info.entrysize = sizeof(LOCALLOCK); - info.hash = tag_hash; - hash_flags = (HASH_ELEM | HASH_FUNCTION); LockMethodLocalHash = hash_create("LOCALLOCK hash", 16, &info, ! hash_flags); } --- 441,451 ---- info.keysize = sizeof(LOCALLOCKTAG); info.entrysize = sizeof(LOCALLOCK); LockMethodLocalHash = hash_create("LOCALLOCK hash", 16, &info, ! HASH_ELEM | HASH_BLOBS); } diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index c9f8657..26a550f 100644 *** a/src/backend/storage/lmgr/lwlock.c --- b/src/backend/storage/lmgr/lwlock.c *************** init_lwlock_stats(void) *** 167,176 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(lwlock_stats_key); ctl.entrysize = sizeof(lwlock_stats); - ctl.hash = tag_hash; ctl.hcxt = lwlock_stats_cxt; lwlock_stats_htab = hash_create("lwlock stats", 16384, &ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); if (!exit_registered) { on_shmem_exit(print_lwlock_stats, 0); --- 167,175 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(lwlock_stats_key); ctl.entrysize = sizeof(lwlock_stats); ctl.hcxt = lwlock_stats_cxt; lwlock_stats_htab = hash_create("lwlock stats", 16384, &ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); if (!exit_registered) { on_shmem_exit(print_lwlock_stats, 0); diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c index f126118..e783955 100644 *** a/src/backend/storage/lmgr/predicate.c --- b/src/backend/storage/lmgr/predicate.c *************** *** 286,292 **** * the lock partition number from the hashcode. */ #define PredicateLockTargetTagHashCode(predicatelocktargettag) \ ! (tag_hash((predicatelocktargettag), sizeof(PREDICATELOCKTARGETTAG))) /* * Given a predicate lock tag, and the hash for its target, --- 286,292 ---- * the lock partition number from the hashcode. */ #define PredicateLockTargetTagHashCode(predicatelocktargettag) \ ! get_hash_value(PredicateLockTargetHash, predicatelocktargettag) /* * Given a predicate lock tag, and the hash for its target, *************** void *** 1095,1101 **** InitPredicateLocks(void) { HASHCTL info; - int hash_flags; long max_table_size; Size requestSize; bool found; --- 1095,1100 ---- *************** InitPredicateLocks(void) *** 1113,1127 **** MemSet(&info, 0, sizeof(info)); info.keysize = sizeof(PREDICATELOCKTARGETTAG); info.entrysize = sizeof(PREDICATELOCKTARGET); - info.hash = tag_hash; info.num_partitions = NUM_PREDICATELOCK_PARTITIONS; - hash_flags = (HASH_ELEM | HASH_FUNCTION | HASH_PARTITION | HASH_FIXED_SIZE); PredicateLockTargetHash = ShmemInitHash("PREDICATELOCKTARGET hash", max_table_size, max_table_size, &info, ! hash_flags); /* Assume an average of 2 xacts per target */ max_table_size *= 2; --- 1112,1125 ---- MemSet(&info, 0, sizeof(info)); info.keysize = sizeof(PREDICATELOCKTARGETTAG); info.entrysize = sizeof(PREDICATELOCKTARGET); info.num_partitions = NUM_PREDICATELOCK_PARTITIONS; PredicateLockTargetHash = ShmemInitHash("PREDICATELOCKTARGET hash", max_table_size, max_table_size, &info, ! HASH_ELEM | HASH_BLOBS | ! HASH_PARTITION | HASH_FIXED_SIZE); /* Assume an average of 2 xacts per target */ max_table_size *= 2; *************** InitPredicateLocks(void) *** 1143,1155 **** info.entrysize = sizeof(PREDICATELOCK); info.hash = predicatelock_hash; info.num_partitions = NUM_PREDICATELOCK_PARTITIONS; - hash_flags = (HASH_ELEM | HASH_FUNCTION | HASH_PARTITION | HASH_FIXED_SIZE); PredicateLockHash = ShmemInitHash("PREDICATELOCK hash", max_table_size, max_table_size, &info, ! hash_flags); /* * Compute size for serializable transaction hashtable. Note these --- 1141,1153 ---- info.entrysize = sizeof(PREDICATELOCK); info.hash = predicatelock_hash; info.num_partitions = NUM_PREDICATELOCK_PARTITIONS; PredicateLockHash = ShmemInitHash("PREDICATELOCK hash", max_table_size, max_table_size, &info, ! HASH_ELEM | HASH_FUNCTION | ! HASH_PARTITION | HASH_FIXED_SIZE); /* * Compute size for serializable transaction hashtable. Note these *************** InitPredicateLocks(void) *** 1224,1237 **** MemSet(&info, 0, sizeof(info)); info.keysize = sizeof(SERIALIZABLEXIDTAG); info.entrysize = sizeof(SERIALIZABLEXID); - info.hash = tag_hash; - hash_flags = (HASH_ELEM | HASH_FUNCTION | HASH_FIXED_SIZE); SerializableXidHash = ShmemInitHash("SERIALIZABLEXID hash", max_table_size, max_table_size, &info, ! hash_flags); /* * Allocate space for tracking rw-conflicts in lists attached to the --- 1222,1234 ---- MemSet(&info, 0, sizeof(info)); info.keysize = sizeof(SERIALIZABLEXIDTAG); info.entrysize = sizeof(SERIALIZABLEXID); SerializableXidHash = ShmemInitHash("SERIALIZABLEXID hash", max_table_size, max_table_size, &info, ! HASH_ELEM | HASH_BLOBS | ! HASH_FIXED_SIZE); /* * Allocate space for tracking rw-conflicts in lists attached to the *************** GetSerializableTransactionSnapshotInt(Sn *** 1793,1803 **** MemSet(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(PREDICATELOCKTARGETTAG); hash_ctl.entrysize = sizeof(LOCALPREDICATELOCK); - hash_ctl.hash = tag_hash; LocalPredicateLockHash = hash_create("Local predicate lock", max_predicate_locks_per_xact, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION); return snapshot; } --- 1790,1799 ---- MemSet(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(PREDICATELOCKTARGETTAG); hash_ctl.entrysize = sizeof(LOCALPREDICATELOCK); LocalPredicateLockHash = hash_create("Local predicate lock", max_predicate_locks_per_xact, &hash_ctl, ! HASH_ELEM | HASH_BLOBS); return snapshot; } diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index 167d61c..53d9188 100644 *** a/src/backend/storage/smgr/md.c --- b/src/backend/storage/smgr/md.c *************** mdinit(void) *** 229,240 **** MemSet(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(RelFileNode); hash_ctl.entrysize = sizeof(PendingOperationEntry); - hash_ctl.hash = tag_hash; hash_ctl.hcxt = pendingOpsCxt; pendingOpsTable = hash_create("Pending Ops Table", 100L, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); pendingUnlinks = NIL; } } --- 229,239 ---- MemSet(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(RelFileNode); hash_ctl.entrysize = sizeof(PendingOperationEntry); hash_ctl.hcxt = pendingOpsCxt; pendingOpsTable = hash_create("Pending Ops Table", 100L, &hash_ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); pendingUnlinks = NIL; } } diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c index d16f559..87be477 100644 *** a/src/backend/storage/smgr/smgr.c --- b/src/backend/storage/smgr/smgr.c *************** smgropen(RelFileNode rnode, BackendId ba *** 146,154 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(RelFileNodeBackend); ctl.entrysize = sizeof(SMgrRelationData); - ctl.hash = tag_hash; SMgrRelationHash = hash_create("smgr relation table", 400, ! &ctl, HASH_ELEM | HASH_FUNCTION); first_unowned_reln = NULL; } --- 146,153 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(RelFileNodeBackend); ctl.entrysize = sizeof(SMgrRelationData); SMgrRelationHash = hash_create("smgr relation table", 400, ! &ctl, HASH_ELEM | HASH_BLOBS); first_unowned_reln = NULL; } diff --git a/src/backend/utils/adt/array_typanalyze.c b/src/backend/utils/adt/array_typanalyze.c index 4d7e9c3..3653888 100644 *** a/src/backend/utils/adt/array_typanalyze.c --- b/src/backend/utils/adt/array_typanalyze.c *************** compute_array_stats(VacAttrStats *stats, *** 290,301 **** MemSet(&count_hash_ctl, 0, sizeof(count_hash_ctl)); count_hash_ctl.keysize = sizeof(int); count_hash_ctl.entrysize = sizeof(DECountItem); - count_hash_ctl.hash = tag_hash; count_hash_ctl.hcxt = CurrentMemoryContext; count_tab = hash_create("Array distinct element count table", 64, &count_hash_ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); /* Initialize counters. */ b_current = 1; --- 290,300 ---- MemSet(&count_hash_ctl, 0, sizeof(count_hash_ctl)); count_hash_ctl.keysize = sizeof(int); count_hash_ctl.entrysize = sizeof(DECountItem); count_hash_ctl.hcxt = CurrentMemoryContext; count_tab = hash_create("Array distinct element count table", 64, &count_hash_ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); /* Initialize counters. */ b_current = 1; diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 94bb5a4..1b68743 100644 *** a/src/backend/utils/adt/pg_locale.c --- b/src/backend/utils/adt/pg_locale.c *************** lookup_collation_cache(Oid collation, bo *** 875,883 **** memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(collation_cache_entry); - ctl.hash = oid_hash; collation_cache = hash_create("Collation cache", 100, &ctl, ! HASH_ELEM | HASH_FUNCTION); } cache_entry = hash_search(collation_cache, &collation, HASH_ENTER, &found); --- 875,882 ---- memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(collation_cache_entry); collation_cache = hash_create("Collation cache", 100, &ctl, ! HASH_ELEM | HASH_BLOBS); } cache_entry = hash_search(collation_cache, &collation, HASH_ENTER, &found); diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c index 2f02303..5c75390 100644 *** a/src/backend/utils/adt/ri_triggers.c --- b/src/backend/utils/adt/ri_triggers.c *************** ri_InitHashTables(void) *** 3326,3335 **** memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(RI_ConstraintInfo); - ctl.hash = oid_hash; ri_constraint_cache = hash_create("RI constraint cache", RI_INIT_CONSTRAINTHASHSIZE, ! &ctl, HASH_ELEM | HASH_FUNCTION); /* Arrange to flush cache on pg_constraint changes */ CacheRegisterSyscacheCallback(CONSTROID, --- 3326,3334 ---- memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(RI_ConstraintInfo); ri_constraint_cache = hash_create("RI constraint cache", RI_INIT_CONSTRAINTHASHSIZE, ! &ctl, HASH_ELEM | HASH_BLOBS); /* Arrange to flush cache on pg_constraint changes */ CacheRegisterSyscacheCallback(CONSTROID, *************** ri_InitHashTables(void) *** 3339,3356 **** memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(RI_QueryKey); ctl.entrysize = sizeof(RI_QueryHashEntry); - ctl.hash = tag_hash; ri_query_cache = hash_create("RI query cache", RI_INIT_QUERYHASHSIZE, ! &ctl, HASH_ELEM | HASH_FUNCTION); memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(RI_CompareKey); ctl.entrysize = sizeof(RI_CompareHashEntry); - ctl.hash = tag_hash; ri_compare_cache = hash_create("RI compare cache", RI_INIT_QUERYHASHSIZE, ! &ctl, HASH_ELEM | HASH_FUNCTION); } --- 3338,3353 ---- memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(RI_QueryKey); ctl.entrysize = sizeof(RI_QueryHashEntry); ri_query_cache = hash_create("RI query cache", RI_INIT_QUERYHASHSIZE, ! &ctl, HASH_ELEM | HASH_BLOBS); memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(RI_CompareKey); ctl.entrysize = sizeof(RI_CompareHashEntry); ri_compare_cache = hash_create("RI compare cache", RI_INIT_QUERYHASHSIZE, ! &ctl, HASH_ELEM | HASH_BLOBS); } diff --git a/src/backend/utils/cache/attoptcache.c b/src/backend/utils/cache/attoptcache.c index 5fcf0dd..75e7ae6 100644 *** a/src/backend/utils/cache/attoptcache.c --- b/src/backend/utils/cache/attoptcache.c *************** InitializeAttoptCache(void) *** 82,91 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(AttoptCacheKey); ctl.entrysize = sizeof(AttoptCacheEntry); - ctl.hash = tag_hash; AttoptCacheHash = hash_create("Attopt cache", 256, &ctl, ! HASH_ELEM | HASH_FUNCTION); /* Make sure we've initialized CacheMemoryContext. */ if (!CacheMemoryContext) --- 82,90 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(AttoptCacheKey); ctl.entrysize = sizeof(AttoptCacheEntry); AttoptCacheHash = hash_create("Attopt cache", 256, &ctl, ! HASH_ELEM | HASH_BLOBS); /* Make sure we've initialized CacheMemoryContext. */ if (!CacheMemoryContext) diff --git a/src/backend/utils/cache/evtcache.c b/src/backend/utils/cache/evtcache.c index b9d442c..01051b0 100644 *** a/src/backend/utils/cache/evtcache.c --- b/src/backend/utils/cache/evtcache.c *************** BuildEventTriggerCache(void) *** 123,132 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(EventTriggerEvent); ctl.entrysize = sizeof(EventTriggerCacheEntry); - ctl.hash = tag_hash; ctl.hcxt = EventTriggerCacheContext; cache = hash_create("Event Trigger Cache", 32, &ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); /* * Prepare to scan pg_event_trigger in name order. --- 123,131 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(EventTriggerEvent); ctl.entrysize = sizeof(EventTriggerCacheEntry); ctl.hcxt = EventTriggerCacheContext; cache = hash_create("Event Trigger Cache", 32, &ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); /* * Prepare to scan pg_event_trigger in name order. diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 79244e5..c2e574d 100644 *** a/src/backend/utils/cache/relcache.c --- b/src/backend/utils/cache/relcache.c *************** LookupOpclassInfo(Oid operatorClassOid, *** 1409,1417 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(OpClassCacheEnt); - ctl.hash = oid_hash; OpClassCache = hash_create("Operator class cache", 64, ! &ctl, HASH_ELEM | HASH_FUNCTION); /* Also make sure CacheMemoryContext exists */ if (!CacheMemoryContext) --- 1409,1416 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(OpClassCacheEnt); OpClassCache = hash_create("Operator class cache", 64, ! &ctl, HASH_ELEM | HASH_BLOBS); /* Also make sure CacheMemoryContext exists */ if (!CacheMemoryContext) *************** RelationCacheInitialize(void) *** 3140,3148 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(RelIdCacheEnt); - ctl.hash = oid_hash; RelationIdCache = hash_create("Relcache by OID", INITRELCACHESIZE, ! &ctl, HASH_ELEM | HASH_FUNCTION); /* * relation mapper needs to be initialized too --- 3139,3146 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(RelIdCacheEnt); RelationIdCache = hash_create("Relcache by OID", INITRELCACHESIZE, ! &ctl, HASH_ELEM | HASH_BLOBS); /* * relation mapper needs to be initialized too diff --git a/src/backend/utils/cache/relfilenodemap.c b/src/backend/utils/cache/relfilenodemap.c index 1e8429c..87813fa 100644 *** a/src/backend/utils/cache/relfilenodemap.c --- b/src/backend/utils/cache/relfilenodemap.c *************** InitializeRelfilenodeMap(void) *** 115,121 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(RelfilenodeMapKey); ctl.entrysize = sizeof(RelfilenodeMapEntry); - ctl.hash = tag_hash; ctl.hcxt = CacheMemoryContext; /* --- 115,120 ---- *************** InitializeRelfilenodeMap(void) *** 125,131 **** */ RelfilenodeMapHash = hash_create("RelfilenodeMap cache", 1024, &ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); /* Watch for invalidation events. */ CacheRegisterRelcacheCallback(RelfilenodeMapInvalidateCallback, --- 124,130 ---- */ RelfilenodeMapHash = hash_create("RelfilenodeMap cache", 1024, &ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); /* Watch for invalidation events. */ CacheRegisterRelcacheCallback(RelfilenodeMapInvalidateCallback, diff --git a/src/backend/utils/cache/spccache.c b/src/backend/utils/cache/spccache.c index 24e8679..6cbafaf 100644 *** a/src/backend/utils/cache/spccache.c --- b/src/backend/utils/cache/spccache.c *************** InitializeTableSpaceCache(void) *** 81,90 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(TableSpaceCacheEntry); - ctl.hash = oid_hash; TableSpaceCacheHash = hash_create("TableSpace cache", 16, &ctl, ! HASH_ELEM | HASH_FUNCTION); /* Make sure we've initialized CacheMemoryContext. */ if (!CacheMemoryContext) --- 81,89 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(TableSpaceCacheEntry); TableSpaceCacheHash = hash_create("TableSpace cache", 16, &ctl, ! HASH_ELEM | HASH_BLOBS); /* Make sure we've initialized CacheMemoryContext. */ if (!CacheMemoryContext) diff --git a/src/backend/utils/cache/ts_cache.c b/src/backend/utils/cache/ts_cache.c index 5ff1461..46da303 100644 *** a/src/backend/utils/cache/ts_cache.c --- b/src/backend/utils/cache/ts_cache.c *************** lookup_ts_parser_cache(Oid prsId) *** 120,128 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(TSParserCacheEntry); - ctl.hash = oid_hash; TSParserCacheHash = hash_create("Tsearch parser cache", 4, ! &ctl, HASH_ELEM | HASH_FUNCTION); /* Flush cache on pg_ts_parser changes */ CacheRegisterSyscacheCallback(TSPARSEROID, InvalidateTSCacheCallBack, PointerGetDatum(TSParserCacheHash)); --- 120,127 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(TSParserCacheEntry); TSParserCacheHash = hash_create("Tsearch parser cache", 4, ! &ctl, HASH_ELEM | HASH_BLOBS); /* Flush cache on pg_ts_parser changes */ CacheRegisterSyscacheCallback(TSPARSEROID, InvalidateTSCacheCallBack, PointerGetDatum(TSParserCacheHash)); *************** lookup_ts_dictionary_cache(Oid dictId) *** 219,227 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(TSDictionaryCacheEntry); - ctl.hash = oid_hash; TSDictionaryCacheHash = hash_create("Tsearch dictionary cache", 8, ! &ctl, HASH_ELEM | HASH_FUNCTION); /* Flush cache on pg_ts_dict and pg_ts_template changes */ CacheRegisterSyscacheCallback(TSDICTOID, InvalidateTSCacheCallBack, PointerGetDatum(TSDictionaryCacheHash)); --- 218,225 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(TSDictionaryCacheEntry); TSDictionaryCacheHash = hash_create("Tsearch dictionary cache", 8, ! &ctl, HASH_ELEM | HASH_BLOBS); /* Flush cache on pg_ts_dict and pg_ts_template changes */ CacheRegisterSyscacheCallback(TSDICTOID, InvalidateTSCacheCallBack, PointerGetDatum(TSDictionaryCacheHash)); *************** init_ts_config_cache(void) *** 368,376 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(TSConfigCacheEntry); - ctl.hash = oid_hash; TSConfigCacheHash = hash_create("Tsearch configuration cache", 16, ! &ctl, HASH_ELEM | HASH_FUNCTION); /* Flush cache on pg_ts_config and pg_ts_config_map changes */ CacheRegisterSyscacheCallback(TSCONFIGOID, InvalidateTSCacheCallBack, PointerGetDatum(TSConfigCacheHash)); --- 366,373 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(TSConfigCacheEntry); TSConfigCacheHash = hash_create("Tsearch configuration cache", 16, ! &ctl, HASH_ELEM | HASH_BLOBS); /* Flush cache on pg_ts_config and pg_ts_config_map changes */ CacheRegisterSyscacheCallback(TSCONFIGOID, InvalidateTSCacheCallBack, PointerGetDatum(TSConfigCacheHash)); diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c index 29aec3c..08e17b9 100644 *** a/src/backend/utils/cache/typcache.c --- b/src/backend/utils/cache/typcache.c *************** lookup_type_cache(Oid type_id, int flags *** 166,174 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(TypeCacheEntry); - ctl.hash = oid_hash; TypeCacheHash = hash_create("Type information cache", 64, ! &ctl, HASH_ELEM | HASH_FUNCTION); /* Also set up callbacks for SI invalidations */ CacheRegisterRelcacheCallback(TypeCacheRelCallback, (Datum) 0); --- 166,173 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(Oid); ctl.entrysize = sizeof(TypeCacheEntry); TypeCacheHash = hash_create("Type information cache", 64, ! &ctl, HASH_ELEM | HASH_BLOBS); /* Also set up callbacks for SI invalidations */ CacheRegisterRelcacheCallback(TypeCacheRelCallback, (Datum) 0); *************** assign_record_type_typmod(TupleDesc tupD *** 846,854 **** MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = REC_HASH_KEYS * sizeof(Oid); ctl.entrysize = sizeof(RecordCacheEntry); - ctl.hash = tag_hash; RecordCacheHash = hash_create("Record information cache", 64, ! &ctl, HASH_ELEM | HASH_FUNCTION); /* Also make sure CacheMemoryContext exists */ if (!CacheMemoryContext) --- 845,852 ---- MemSet(&ctl, 0, sizeof(ctl)); ctl.keysize = REC_HASH_KEYS * sizeof(Oid); ctl.entrysize = sizeof(RecordCacheEntry); RecordCacheHash = hash_create("Record information cache", 64, ! &ctl, HASH_ELEM | HASH_BLOBS); /* Also make sure CacheMemoryContext exists */ if (!CacheMemoryContext) diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c index a95be05..e81c855 100644 *** a/src/backend/utils/fmgr/fmgr.c --- b/src/backend/utils/fmgr/fmgr.c *************** record_C_func(HeapTuple procedureTuple, *** 540,550 **** MemSet(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(CFuncHashTabEntry); - hash_ctl.hash = oid_hash; CFuncHash = hash_create("CFuncHash", 100, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION); } entry = (CFuncHashTabEntry *) --- 540,549 ---- MemSet(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(CFuncHashTabEntry); CFuncHash = hash_create("CFuncHash", 100, &hash_ctl, ! HASH_ELEM | HASH_BLOBS); } entry = (CFuncHashTabEntry *) diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c index 2b99e4b..798cfb2 100644 *** a/src/backend/utils/hash/dynahash.c --- b/src/backend/utils/hash/dynahash.c *************** hash_create(const char *tabname, long ne *** 305,312 **** --- 305,326 ---- hashp->tabname = (char *) (hashp + 1); strcpy(hashp->tabname, tabname); + /* + * Select the appropriate hash function. Caller can specify a custom hash + * function, or can set the HASH_BLOBS flag to select an appropriate hash + * function for fixed-size binary keys; the default is to select a hash + * function suitable for null-terminated string keys. + */ if (flags & HASH_FUNCTION) hashp->hash = info->hash; + else if (flags & HASH_BLOBS) + { + Assert(flags & HASH_ELEM); + if (info->keysize == sizeof(uint32)) + hashp->hash = uint32_hash; + else + hashp->hash = tag_hash; + } else hashp->hash = string_hash; /* default hash function */ diff --git a/src/backend/utils/hash/hashfn.c b/src/backend/utils/hash/hashfn.c index a12f98f..adb0bd9 100644 *** a/src/backend/utils/hash/hashfn.c --- b/src/backend/utils/hash/hashfn.c *************** tag_hash(const void *key, Size keysize) *** 55,69 **** } /* ! * oid_hash: hash function for keys that are OIDs * * (tag_hash works for this case too, but is slower) */ uint32 ! oid_hash(const void *key, Size keysize) { ! Assert(keysize == sizeof(Oid)); ! return DatumGetUInt32(hash_uint32((uint32) *((const Oid *) key))); } /* --- 55,69 ---- } /* ! * uint32_hash: hash function for keys that are uint32 or int32 * * (tag_hash works for this case too, but is slower) */ uint32 ! uint32_hash(const void *key, Size keysize) { ! Assert(keysize == sizeof(uint32)); ! return DatumGetUInt32(hash_uint32(*((const uint32 *) key))); } /* diff --git a/src/backend/utils/time/combocid.c b/src/backend/utils/time/combocid.c index a70c754..ea7a905 100644 *** a/src/backend/utils/time/combocid.c --- b/src/backend/utils/time/combocid.c *************** GetComboCommandId(CommandId cmin, Comman *** 218,230 **** memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(ComboCidKeyData); hash_ctl.entrysize = sizeof(ComboCidEntryData); - hash_ctl.hash = tag_hash; hash_ctl.hcxt = TopTransactionContext; comboHash = hash_create("Combo CIDs", CCID_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); comboCids = (ComboCidKeyData *) MemoryContextAlloc(TopTransactionContext, --- 218,229 ---- memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(ComboCidKeyData); hash_ctl.entrysize = sizeof(ComboCidEntryData); hash_ctl.hcxt = TopTransactionContext; comboHash = hash_create("Combo CIDs", CCID_HASH_SIZE, &hash_ctl, ! HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); comboCids = (ComboCidKeyData *) MemoryContextAlloc(TopTransactionContext, diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h index 77974a1..a2eb7bc 100644 *** a/src/include/utils/hsearch.h --- b/src/include/utils/hsearch.h *************** typedef struct HASHCTL *** 93,98 **** --- 93,99 ---- #define HASH_COMPARE 0x400 /* Set user defined comparison function */ #define HASH_KEYCOPY 0x800 /* Set user defined key-copying function */ #define HASH_FIXED_SIZE 0x1000 /* Initial size is a hard limit */ + #define HASH_BLOBS 0x2000 /* Use support functions for binary keys */ /* max_dsize value to indicate expansible directory */ *************** extern void AtEOSubXact_HashTables(bool *** 143,153 **** /* * prototypes for functions in hashfn.c */ extern uint32 string_hash(const void *key, Size keysize); extern uint32 tag_hash(const void *key, Size keysize); ! extern uint32 oid_hash(const void *key, Size keysize); extern uint32 bitmap_hash(const void *key, Size keysize); extern int bitmap_match(const void *key1, const void *key2, Size keysize); #endif /* HSEARCH_H */ --- 144,160 ---- /* * prototypes for functions in hashfn.c + * + * Note: It is deprecated for callers of hash_create to explicitly specify + * string_hash, tag_hash, uint32_hash, or oid_hash. Just set HASH_BLOBS or + * not. Use HASH_FUNCTION only when you want something other than those. */ extern uint32 string_hash(const void *key, Size keysize); extern uint32 tag_hash(const void *key, Size keysize); ! extern uint32 uint32_hash(const void *key, Size keysize); extern uint32 bitmap_hash(const void *key, Size keysize); extern int bitmap_match(const void *key1, const void *key2, Size keysize); + #define oid_hash uint32_hash /* Remove me eventually */ + #endif /* HSEARCH_H */ diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c index 5629571..492c1ef 100644 *** a/src/pl/plperl/plperl.c --- b/src/pl/plperl/plperl.c *************** _PG_init(void) *** 456,475 **** memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(plperl_interp_desc); - hash_ctl.hash = oid_hash; plperl_interp_hash = hash_create("PL/Perl interpreters", 8, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION); memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(plperl_proc_key); hash_ctl.entrysize = sizeof(plperl_proc_ptr); - hash_ctl.hash = tag_hash; plperl_proc_hash = hash_create("PL/Perl procedures", 32, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION); /* * Save the default opmask. --- 456,473 ---- memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(plperl_interp_desc); plperl_interp_hash = hash_create("PL/Perl interpreters", 8, &hash_ctl, ! HASH_ELEM | HASH_BLOBS); memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(plperl_proc_key); hash_ctl.entrysize = sizeof(plperl_proc_ptr); plperl_proc_hash = hash_create("PL/Perl procedures", 32, &hash_ctl, ! HASH_ELEM | HASH_BLOBS); /* * Save the default opmask. diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c index d2fd685..24dc371 100644 *** a/src/pl/plpgsql/src/pl_comp.c --- b/src/pl/plpgsql/src/pl_comp.c *************** plpgsql_HashTableInit(void) *** 2519,2529 **** memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(PLpgSQL_func_hashkey); ctl.entrysize = sizeof(plpgsql_HashEnt); - ctl.hash = tag_hash; plpgsql_HashTable = hash_create("PLpgSQL function cache", FUNCS_PER_USER, &ctl, ! HASH_ELEM | HASH_FUNCTION); } static PLpgSQL_function * --- 2519,2528 ---- memset(&ctl, 0, sizeof(ctl)); ctl.keysize = sizeof(PLpgSQL_func_hashkey); ctl.entrysize = sizeof(plpgsql_HashEnt); plpgsql_HashTable = hash_create("PLpgSQL function cache", FUNCS_PER_USER, &ctl, ! HASH_ELEM | HASH_BLOBS); } static PLpgSQL_function * diff --git a/src/pl/plpython/plpy_plpymodule.c b/src/pl/plpython/plpy_plpymodule.c index 37ea2a4..b8dc891 100644 *** a/src/pl/plpython/plpy_plpymodule.c --- b/src/pl/plpython/plpy_plpymodule.c *************** PLy_add_exceptions(PyObject *plpy) *** 226,234 **** memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(int); hash_ctl.entrysize = sizeof(PLyExceptionEntry); - hash_ctl.hash = tag_hash; PLy_spi_exceptions = hash_create("SPI exceptions", 256, ! &hash_ctl, HASH_ELEM | HASH_FUNCTION); PLy_generate_spi_exceptions(excmod, PLy_exc_spi_error); } --- 226,233 ---- memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(int); hash_ctl.entrysize = sizeof(PLyExceptionEntry); PLy_spi_exceptions = hash_create("SPI exceptions", 256, ! &hash_ctl, HASH_ELEM | HASH_BLOBS); PLy_generate_spi_exceptions(excmod, PLy_exc_spi_error); } diff --git a/src/pl/plpython/plpy_procedure.c b/src/pl/plpython/plpy_procedure.c index fad80b2..bd48301 100644 *** a/src/pl/plpython/plpy_procedure.c --- b/src/pl/plpython/plpy_procedure.c *************** init_procedure_caches(void) *** 39,47 **** memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(PLyProcedureKey); hash_ctl.entrysize = sizeof(PLyProcedureEntry); - hash_ctl.hash = tag_hash; PLy_procedure_cache = hash_create("PL/Python procedures", 32, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION); } /* --- 39,46 ---- memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(PLyProcedureKey); hash_ctl.entrysize = sizeof(PLyProcedureEntry); PLy_procedure_cache = hash_create("PL/Python procedures", 32, &hash_ctl, ! HASH_ELEM | HASH_BLOBS); } /* diff --git a/src/pl/tcl/pltcl.c b/src/pl/tcl/pltcl.c index a53cca4..8f98046 100644 *** a/src/pl/tcl/pltcl.c --- b/src/pl/tcl/pltcl.c *************** _PG_init(void) *** 378,388 **** memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(pltcl_interp_desc); - hash_ctl.hash = oid_hash; pltcl_interp_htab = hash_create("PL/Tcl interpreters", 8, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION); /************************************************************ * Create the hash table for function lookup --- 378,387 ---- memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(Oid); hash_ctl.entrysize = sizeof(pltcl_interp_desc); pltcl_interp_htab = hash_create("PL/Tcl interpreters", 8, &hash_ctl, ! HASH_ELEM | HASH_BLOBS); /************************************************************ * Create the hash table for function lookup *************** _PG_init(void) *** 390,400 **** memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(pltcl_proc_key); hash_ctl.entrysize = sizeof(pltcl_proc_ptr); - hash_ctl.hash = tag_hash; pltcl_proc_htab = hash_create("PL/Tcl functions", 100, &hash_ctl, ! HASH_ELEM | HASH_FUNCTION); pltcl_pm_init_done = true; } --- 389,398 ---- memset(&hash_ctl, 0, sizeof(hash_ctl)); hash_ctl.keysize = sizeof(pltcl_proc_key); hash_ctl.entrysize = sizeof(pltcl_proc_ptr); pltcl_proc_htab = hash_create("PL/Tcl functions", 100, &hash_ctl, ! HASH_ELEM | HASH_BLOBS); pltcl_pm_init_done = true; }
I wrote: > Here's a proposed patch along this line. I left in oid_hash (in the > form of a macro) so that this does not cause any API break for existing > third-party modules. However, no callers in our own code directly > refer to tag_hash or oid_hash anymore. Committed that version after some further comment wordsmithing. On Teodor's original test cases, I see about 8% speedup compared to the 4%-ish numbers he originally reported. This may be random variation or it might mean that we got a win someplace else besides tidbitmap.c. I've not tried to sleuth it down exactly. I am wondering though if this suggests that it'd be worth our while to add a similar fast path for 8-byte hash keys. That would be quite painless to add now (with the exception of actually coding the fast hash function, perhaps). regards, tom lane
Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
From
Jim Nasby
Date:
On 12/18/14, 12:48 PM, Tom Lane wrote: > I wrote: >> Here's a proposed patch along this line. I left in oid_hash (in the >> form of a macro) so that this does not cause any API break for existing >> third-party modules. However, no callers in our own code directly >> refer to tag_hash or oid_hash anymore. > > Committed that version after some further comment wordsmithing. > > On Teodor's original test cases, I see about 8% speedup compared to > the 4%-ish numbers he originally reported. This may be random variation > or it might mean that we got a win someplace else besides tidbitmap.c. > I've not tried to sleuth it down exactly. I am wondering though if > this suggests that it'd be worth our while to add a similar fast path > for 8-byte hash keys. That would be quite painless to add now (with > the exception of actually coding the fast hash function, perhaps). I stuck an elog in to report when a hash wasn't found, and came up with these results (first number is how many times a hashwasn't found, second is keysize). I don't see a lot of 8's in there... Second set of numbers is the same thing, except counting every time we looked for the hash value, regardless of result. (Bothcases generated by running make check.) BTW, as part of this I found some hash values were hit over 30k times in the first case. I don't know if that means a boatloadof collisions or something else. Command to generate that data is: $egrep '^LOG: Hashtable ' src/test/regress/log/postmaster.log|cut -d\| -f2,4|sort|uniq -c|cut -d' ' -f1|sort -n|tail Before hitting the raw data, here is a summary of hash searches by key size (generated by patch 1): 43 48 327 256 1411 416 9321 136 12436 12 31270 64 107017 8 203127 16 628107 4 2201582 20 -- Mostly LOCALLOCK and Shared Buffer egrep '^LOG: Hashtable ' src/test/regress/log/postmaster.log |cut -d\| -f2,6|sort|uniq -c (patch 2) 581 Analyzed elements table|8 1141 Analyzed lexemes table|16 46 Array distinct element count table|4 167 Attopt cache|8 26 Btree proof lookup cache|8 95 CFuncHash|4 2387 Combo CIDs|8 99 Databases hash|4 22 Event Trigger Cache|4 85 JoinRelHashTable|8 460690 LOCALLOCK hash|20 1 LOCALLOCK hash|LOCALLOCK hash 49608 LOCK hash|16 951 Local Buffer Lookup Table|20 5 Local predicate lock|16 581 Operator class cache|4 1658 Operator lookup cache|136 301 PLpgSQL function cache|416 5 PREDICATELOCK hash|16 14 PREDICATELOCKTARGET hash|16 52278 PROCLOCK hash|16 1064 Pending Ops Table|12 14790 Per-database table|4 15297 Portal hash|64 21 Prepared Queries|64 126 PrivateRefCount|4 4 RI compare cache|8 55 RI constraint cache|4 90 RI query cache|8 72 Record information cache|64 24040 Relcache by OID|4 742 RelfilenodeMap cache|8 21 Rendezvous variable hash|64 4 Rewrite / Old to new tid map|12 49 Rewrite / Unresolved ctids|12 5 SERIALIZABLEXID hash|4 60 Sequence values|4 9603 Shared Buffer Lookup Table|20 43 ShmemIndex|48 5335 TIDBitmap|4 138 TableSpace cache|4 16516 Temporary table of OIDs|4 169 Timezones|256 6 Tsearch configuration cache|4 4 Tsearch dictionary cache|4 1 Tsearch parser cache|4 11550 TupleHashTable|8 327 Type information cache|4 59 json object hashtable|64 17430 smgr relation table|16 egrep '^LOG: Hashtable ' src/test/regress/log/postmaster.log |cut -d\| -f2,6|sort|uniq -c (patch 1) 2250 Analyzed elements table|8 28817 Analyzed lexemes table|16 428 Array distinct element count table|4 447 Attopt cache|8 224 Btree proof lookup cache|8 1363 CFuncHash|4 5219 Combo CIDs|8 2415 Databases hash|4 12063 Event Trigger Cache|4 178 JoinRelHashTable|8 990753 LOCALLOCK hash|20 72129 LOCK hash|16 19858 Local Buffer Lookup Table|20 2 Local Buffer Lookup Table|LOG: Hashtable 7 Local predicate lock|16 4557 Operator class cache|4 9321 Operator lookup cache|136 1 Operator lookup cache|136LOG: Hashtable 1411 PLpgSQL function cache|416 8 PREDICATELOCK hash|16 106 PREDICATELOCKTARGET hash|16 73102 PROCLOCK hash|16 10929 Pending Ops Table|12 23534 Per-database table|4 29502 Portal hash|64 233 Prepared Queries|64 689 PrivateRefCount|4 91 RI compare cache|8 310 RI constraint cache|4 289 RI query cache|8 1471 Record information cache|64 1 Record information cache|64LOG: Hashtable 1 Record information cache|6LOG: Hashtable 426050 Relcache by OID|4 1308 RelfilenodeMap cache|8 12 Rendezvous variable hash|64 24 Rewrite / Old to new tid map|12 1483 Rewrite / Unresolved ctids|12 12 SERIALIZABLEXID hash|4 263 Sequence values|4 1190971 Shared Buffer Lookup Table|20 30 Shared Buffer Lookup Table|20LOG: Hashtable 20 Shared Buffer Lookup Table|2LOG: Hashtable 1 Shared Buffer Lookup Table|Event Trigger Cache 27 Shared Buffer Lookup Table|LOCALLOCK hash 2 Shared Buffer Lookup Table|LOCK hash 14 Shared Buffer Lookup Table|LOG: Hashtable 9 Shared Buffer Lookup Table|PROCLOCK hash 10 Shared Buffer Lookup Table|Relcache by OID 20 Shared Buffer Lookup Table|Shared Buffer Lookup Table 3 Shared Buffer Lookup Table|TIDBitmap 1 Shared Buffer Lookup Table|smgr relation table 43 ShmemIndex|48 104683 TIDBitmap|4 17 TOAST to main relid map|4 19542 TableSpace cache|4 14058 Temporary table of OIDs|4 327 Timezones|256 5 Tsearch configuration cache|4 68 Tsearch dictionary cache|4 3 Tsearch parser cache|4 1 TupleHashTable| hash 97011 TupleHashTable|8 18045 Type information cache|4 2 db hash|4 52 json object hashtable|64 28958 smgr relation table|16 -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Attachment
Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
From
Jim Nasby
Date:
On 12/18/14, 5:00 PM, Jim Nasby wrote: > 2201582 20 -- Mostly LOCALLOCK and Shared Buffer Started looking into this; perhaps https://code.google.com/p/fast-hash/ would be worth looking at, though it requires uint64. It also occurs to me that we're needlessly shoving a lot of 0's into the hash input by using RelFileNode and ForkNumber.RelFileNode includes the tablespace Oid, which is pointless here because relid is unique per-database. We alsohave very few forks and typically care about very few databases. If we crammed dbid and ForkNum together that gets usdown to 12 bytes, which at minimum saves us the trip through the case logic. I suspect it also means we could eliminateone of the mix() calls. But I wonder if we could still do better, because we typically also won't have that many relations. Is there some fast waywe could combine dbid, forkNum and relid into a uint32? That gets us down to 8 bytes, which means we could use fash-hash,or a stripped down mix(). Unfortunately I don't know much about hash algorithms, so I don't know how practical any of this actually is, or what a goodmethod for combining those fields would be. My current idea is something like (rot(forkNum, 2) | dbid) ^ relid, but ifyou got unlucky with your oid values you could end up with a lot of collissions from that. I can put some effort into this, but I'd like some guidance. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > On 12/18/14, 5:00 PM, Jim Nasby wrote: >> 2201582 20 -- Mostly LOCALLOCK and Shared Buffer > Started looking into this; perhaps https://code.google.com/p/fast-hash/ would be worth looking at, though it requires uint64. > It also occurs to me that we're needlessly shoving a lot of 0's into the hash input by using RelFileNode and ForkNumber.RelFileNode includes the tablespace Oid, which is pointless here because relid is unique per-database. We alsohave very few forks and typically care about very few databases. If we crammed dbid and ForkNum together that gets usdown to 12 bytes, which at minimum saves us the trip through the case logic. I suspect it also means we could eliminateone of the mix() calls. I don't see this working. The lock table in shared memory can surely take no such shortcuts. We could make a backend's locallock table omit fields that are predictable within the set of objects that backend could ever lock, but (1) this doesn't help unless we can reduce the tag size for all LockTagTypes, which we probably can't, and (2) having the locallock's tag be different from the corresponding shared tag would be a mess too. I think dealing with (2) might easily eat all the cycles we could hope to save from a smaller hash tag ... and that's not even considering the added logical complexity and potential for bugs. Switching to a different hash algorithm could be feasible, perhaps. I think we're likely stuck with Jenkins hashing for hashes that go to disk, but hashes for dynahash tables don't do that. regards, tom lane
Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
From
"ktm@rice.edu"
Date:
On Fri, Dec 19, 2014 at 04:41:51PM -0600, Jim Nasby wrote: > On 12/18/14, 5:00 PM, Jim Nasby wrote: > >2201582 20 -- Mostly LOCALLOCK and Shared Buffer > > Started looking into this; perhaps https://code.google.com/p/fast-hash/ would be worth looking at, though it requires uint64. > > It also occurs to me that we're needlessly shoving a lot of 0's into the hash input by using RelFileNode and ForkNumber.RelFileNode includes the tablespace Oid, which is pointless here because relid is unique per-database. We alsohave very few forks and typically care about very few databases. If we crammed dbid and ForkNum together that gets usdown to 12 bytes, which at minimum saves us the trip through the case logic. I suspect it also means we could eliminateone of the mix() calls. > > But I wonder if we could still do better, because we typically also won't have that many relations. Is there some fastway we could combine dbid, forkNum and relid into a uint32? That gets us down to 8 bytes, which means we could use fash-hash,or a stripped down mix(). > > Unfortunately I don't know much about hash algorithms, so I don't know how practical any of this actually is, or what agood method for combining those fields would be. My current idea is something like (rot(forkNum, 2) | dbid) ^ relid, butif you got unlucky with your oid values you could end up with a lot of collissions from that. > > I can put some effort into this, but I'd like some guidance. > -- > Jim Nasby, Data Architect, Blue Treble Consulting > Data in Trouble? Get it in Treble! http://BlueTreble.com > Hi, If we are going to consider changing the hash function, we should consider something like xxhash which runs at 13.8GB/s on a 2.7GHz x86_64 for the XXH64 variant and 6.8GB/s for the XXH32 variant which is double the speed of fast-hash according to the page running on a 3GHz x86_64. In addition, something like that could be used a checksum instead of the current CRC32, although some work has already gone into speeding it up, as is. Otherwise, it probably makes sense to just stick with creating the fastpath 8-byte analogously to the 4-byte fastpath that was just added. Is calculating the hash the bottle-neck? Regards, Ken
"ktm@rice.edu" <ktm@rice.edu> writes: > If we are going to consider changing the hash function, we should > consider something like xxhash which runs at 13.8GB/s on a 2.7GHz > x86_64 for the XXH64 variant and 6.8GB/s for the XXH32 variant which > is double the speed of fast-hash according to the page running on a > 3GHz x86_64. Well, as the google page points out, raw speed is not the only figure of merit; otherwise we'd just xor all the bytes and call it good. We need the hash function to spread out the hash values well, or else we lose more cycles chasing inordinately-long hash chains than we saved with a cheap hash function. Google claims their fast-hash is actually better on this point than Jenkins, which if true would be very nice indeed. Keep in mind also that very very few of our hash keys are longer than about 20 bytes, so that speed-per-byte is not that exciting anyway. Longer setup/finish times could easily swamp any per-byte advantages, for example. regards, tom lane
Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
From
Jim Nasby
Date:
On 12/19/14, 5:13 PM, Tom Lane wrote: > Jim Nasby <Jim.Nasby@BlueTreble.com> writes: >> On 12/18/14, 5:00 PM, Jim Nasby wrote: >>> 2201582 20 -- Mostly LOCALLOCK and Shared Buffer > >> Started looking into this; perhaps https://code.google.com/p/fast-hash/ would be worth looking at, though it requiresuint64. > >> It also occurs to me that we're needlessly shoving a lot of 0's into the hash input by using RelFileNode and ForkNumber.RelFileNode includes the tablespace Oid, which is pointless here because relid is unique per-database. We alsohave very few forks and typically care about very few databases. If we crammed dbid and ForkNum together that gets usdown to 12 bytes, which at minimum saves us the trip through the case logic. I suspect it also means we could eliminateone of the mix() calls. > > I don't see this working. The lock table in shared memory can surely take > no such shortcuts. We could make a backend's locallock table omit fields > that are predictable within the set of objects that backend could ever > lock, but (1) this doesn't help unless we can reduce the tag size for all > LockTagTypes, which we probably can't, and (2) having the locallock's tag > be different from the corresponding shared tag would be a mess too. > I think dealing with (2) might easily eat all the cycles we could hope to > save from a smaller hash tag ... and that's not even considering the added > logical complexity and potential for bugs. I think we may be thinking different things here... I'm not suggesting we change BufferTag or BufferLookupEnt; clearly we can't simply throw away any of the fields I was talkingabout (well, except possibly tablespace ID. AFAICT that's completely redundant for searching because relid is UNIQUE). What I am thinking is not using all of those fields in their raw form to calculate the hash value. IE: something analogousto: hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode) << 32 | blockNum) perhaps that actual code wouldn't work, but I don't see why we couldn't do something similar... am I missing something? > Switching to a different hash algorithm could be feasible, perhaps. > I think we're likely stuck with Jenkins hashing for hashes that go to > disk, but hashes for dynahash tables don't do that. Yeah, I plan on testing the performance of fash-hash for HASH_BLOBS just to see how it compares. Why would we be stuck with Jenkins hashing for on-disk data? pg_upgrade, or something else? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
From
Andres Freund
Date:
On 2014-12-19 22:03:55 -0600, Jim Nasby wrote: > I'm not suggesting we change BufferTag or BufferLookupEnt; clearly we > can't simply throw away any of the fields I was talking about (well, > except possibly tablespace ID. AFAICT that's completely redundant for > searching because relid is UNIQUE). It's actually not. BufferTag's contain relnodes via RelFileNode - that's not the relation's oid, but the filenode. And that's *not* guranteed unique across database unfortunately. > What I am thinking is not using all of those fields in their raw form to calculate the hash value. IE: something analogousto: > > hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode) << 32 | blockNum) > > perhaps that actual code wouldn't work, but I don't see why we couldn't do something similar... am I missing something? I don't think that'd improve anything. Jenkin's hash does have a quite mixing properties, I don't believe that the above would improve the quality of the hash. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-12-19 22:03:55 -0600, Jim Nasby wrote: >> What I am thinking is not using all of those fields in their raw form to calculate the hash value. IE: something analogousto: >> hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode) << 32 | blockNum) >> >> perhaps that actual code wouldn't work, but I don't see why we couldn't do something similar... am I missing something? > I don't think that'd improve anything. Jenkin's hash does have a quite > mixing properties, I don't believe that the above would improve the > quality of the hash. I think what Jim is suggesting is to intentionally degrade the quality of the hash in order to let it be calculated a tad faster. We could do that but I doubt it would be a win, especially in systems with lots of buffers. IIRC, when we put in Jenkins hashing to replace the older homebrew hash function, it improved performance even though the hash itself was slower. regards, tom lane
Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
From
Jim Nasby
Date:
On 12/20/14, 11:51 AM, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: >> On 2014-12-19 22:03:55 -0600, Jim Nasby wrote: >>> What I am thinking is not using all of those fields in their raw form to calculate the hash value. IE: something analogousto: >>> hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode) << 32 | blockNum) >>> >>> perhaps that actual code wouldn't work, but I don't see why we couldn't do something similar... am I missing something? > >> I don't think that'd improve anything. Jenkin's hash does have a quite >> mixing properties, I don't believe that the above would improve the >> quality of the hash. > > I think what Jim is suggesting is to intentionally degrade the quality of > the hash in order to let it be calculated a tad faster. We could do that > but I doubt it would be a win, especially in systems with lots of buffers. > IIRC, when we put in Jenkins hashing to replace the older homebrew hash > function, it improved performance even though the hash itself was slower. Right. Now that you mention it, I vaguely recall the discussions about changing the hash function to reduce collisions. I'll still take a look at fash-hash, but it's looking like there may not be anything we can do here unless we change howwe identify relation files (combining dbid, tablespace id, fork number and file id, at least for searching). If we had64bit hash support then maybe that'd be a significant win, since you wouldn't need to hash at all. But that certainlydoesn't seem to be low-hanging fruit to me... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
From
Jim Nasby
Date:
On 12/20/14, 2:13 PM, Jim Nasby wrote: > On 12/20/14, 11:51 AM, Tom Lane wrote: >> Andres Freund <andres@2ndquadrant.com> writes: >>> On 2014-12-19 22:03:55 -0600, Jim Nasby wrote: >>>> What I am thinking is not using all of those fields in their raw form to calculate the hash value. IE: something analogousto: >>>> hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode) << 32 | blockNum) >>>> >>>> perhaps that actual code wouldn't work, but I don't see why we couldn't do something similar... am I missing something? >> >>> I don't think that'd improve anything. Jenkin's hash does have a quite >>> mixing properties, I don't believe that the above would improve the >>> quality of the hash. >> >> I think what Jim is suggesting is to intentionally degrade the quality of >> the hash in order to let it be calculated a tad faster. We could do that >> but I doubt it would be a win, especially in systems with lots of buffers. >> IIRC, when we put in Jenkins hashing to replace the older homebrew hash >> function, it improved performance even though the hash itself was slower. > > Right. Now that you mention it, I vaguely recall the discussions about changing the hash function to reduce collisions. > > I'll still take a look at fash-hash, but it's looking like there may not be anything we can do here unless we change howwe identify relation files (combining dbid, tablespace id, fork number and file id, at least for searching). If we had64bit hash support then maybe that'd be a significant win, since you wouldn't need to hash at all. But that certainlydoesn't seem to be low-hanging fruit to me... I've taken a look at a number of different hash algorithms, testing them with initially with SMHasher [1] and then with pgbench. It's worth noting that a lot of work has been done in the area of hash algo's since we last updated the hash_any algorithmin 2009 [2]. It's probably worth revisiting this every other release or so. Most of my SMHasher results are at [3]. I suspect SpookyHash is close to what we currently have, so that's what I used fora baseline. I determined that fash-hash (called superfast in results) was faster than Spooky, but not as fast as CityHash64[4]or xxhash[5]. CityHash and xxhash had similar performance, but xxhash seems to have slightly better characteristicsaccording to SMHasher, and more important, it's in C, not C++. However, CityHash has been replaced by farmhash[7],which might be faster than xxhash. I did a quick hack at using xxhash ONLY for shared buffer lookup [6]. I've attached the full patch, as well as a versionthat omits the new files. Note that currently xxhash is setup in such a way that we'd get different results dependingon endian-ness, so we couldn't just drop this in as-is across the board. Of course, there's also the question ofif we'd want to force a REINDEX on hash indexes. pgbench results are below. Select-only testing was done first, then read-write. There were several select-only runs on bothto warm shared_buffers (set to 512MB for this test, and fsync is off). Database initialized with bin/pgbench -i -F 100-s 10. pgbench -S -T10 -c 4 -j 4 master: tps = 9556.356145 (excluding connections establishing) tps = 9897.324917 (excluding connections establishing) tps = 9287.286907 (excluding connections establishing) tps = 10210.130833 (excluding connections establishing) XXH32: tps = 32462.754437 (excluding connections establishing) tps = 33232.144511 (excluding connections establishing) tps = 33082.436760 (excluding connections establishing) tps = 33597.904532 (excluding connections establishing) pgbench -T10 -c 4 -j 4 master: tps = 2299.314145 (excluding connections establishing) tps = 2029.956749 (excluding connections establishing) tps = 2067.462362 (excluding connections establishing) XXH32: tps = 2653.919321 (excluding connections establishing) tps = 2615.764370 (excluding connections establishing) tps = 2952.144461 (excluding connections establishing) Questions: Do we want to do what we've previously done and cut/paste/modify this code into our repo? Given how rapidly hash algorithmsseem to be changing I'm inclined to minimize the amount of effort required to pull a new one in... Can someone with a big-endian CPU see what the impact of XXH_FORCE_NATIVE_FORMAT 1 vs 0? If there's a notable differencewe might want to do different things for on-disk vs in-memory hashes. For that matter, assuming we adopt this, do we want to replace the index hash functions too? SMHasher shows that CityHashis ~50% faster than XXHash for 262144 byte keys. Even SpookyHash is about 17% faster on that key size. So if we wantto do something with hash indexes, we'd probably be better off with City or Farm hash than XXHash. [1] https://code.google.com/p/smhasher/ [2] https://github.com/postgres/postgres/commit/8205258fa675115439017b626c4932d5fefe2ea8 [3] https://github.com/decibel/hash_testing/tree/master/results [4] https://code.google.com/p/cityhash/ [5] https://code.google.com/p/xxhash/ [6] https://github.com/decibel/postgres/commit/b82e7a3b936b3950b296514f6ee0542132f11e4a [7] https://code.google.com/p/farmhash/ -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Attachment
Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
From
Jim Nasby
Date:
On 12/24/14, 12:27 AM, Jim Nasby wrote: > There were several select-only runs on both to warm shared_buffers (set to 512MB for this test, and fsync is off). BTW, presumably this ~380M database isn't big enough to show any problems with hash collisions, and I'm guessing you'd needway more memory than what I have on this laptop. Might be worth looking into that on a machine with a serious amountof memory. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
From
Andres Freund
Date:
On 2014-12-24 00:27:39 -0600, Jim Nasby wrote: > pgbench -S -T10 -c 4 -j 4 > master: > tps = 9556.356145 (excluding connections establishing) > tps = 9897.324917 (excluding connections establishing) > tps = 9287.286907 (excluding connections establishing) > tps = 10210.130833 (excluding connections establishing) > > XXH32: > tps = 32462.754437 (excluding connections establishing) > tps = 33232.144511 (excluding connections establishing) > tps = 33082.436760 (excluding connections establishing) > tps = 33597.904532 (excluding connections establishing) FWIW, I don't believe these results for one second. It's quite plausible that there's a noticeable performance benefit, but a factor of three is completely unrealistic... Could you please recheck? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-12-24 00:27:39 -0600, Jim Nasby wrote: >> pgbench -S -T10 -c 4 -j 4 >> master: >> tps = 9556.356145 (excluding connections establishing) >> tps = 9897.324917 (excluding connections establishing) >> tps = 9287.286907 (excluding connections establishing) >> tps = 10210.130833 (excluding connections establishing) >> >> XXH32: >> tps = 32462.754437 (excluding connections establishing) >> tps = 33232.144511 (excluding connections establishing) >> tps = 33082.436760 (excluding connections establishing) >> tps = 33597.904532 (excluding connections establishing) > FWIW, I don't believe these results for one second. It's quite plausible > that there's a noticeable performance benefit, but a factor of three is > completely unrealistic... Could you please recheck? A possible theory is that the hash change moved some locks into different partitions causing a large reduction in contention, but even then 3X seems unlikely. And of course if that was the mechanism, the result is still pure luck; other examples might get worse by the same amount. regards, tom lane
Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
From
Jim Nasby
Date:
On 12/24/14, 10:58 AM, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: >> On 2014-12-24 00:27:39 -0600, Jim Nasby wrote: >>> pgbench -S -T10 -c 4 -j 4 >>> master: >>> tps = 9556.356145 (excluding connections establishing) >>> tps = 9897.324917 (excluding connections establishing) >>> tps = 9287.286907 (excluding connections establishing) >>> tps = 10210.130833 (excluding connections establishing) >>> >>> XXH32: >>> tps = 32462.754437 (excluding connections establishing) >>> tps = 33232.144511 (excluding connections establishing) >>> tps = 33082.436760 (excluding connections establishing) >>> tps = 33597.904532 (excluding connections establishing) > >> FWIW, I don't believe these results for one second. It's quite plausible >> that there's a noticeable performance benefit, but a factor of three is >> completely unrealistic... Could you please recheck? > > A possible theory is that the hash change moved some locks into > different partitions causing a large reduction in contention, > but even then 3X seems unlikely. And of course if that was > the mechanism, the result is still pure luck; other examples > might get worse by the same amount. I must have screwed something up, because if anything I see a small loss for XXH now (but my laptop isn't consistent enoughto be sure). This surprises me given that SMHasher shows XXH to be 50% faster than Spooky for 20 byte keys. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > On 12/24/14, 10:58 AM, Tom Lane wrote: >> Andres Freund <andres@2ndquadrant.com> writes: >>> FWIW, I don't believe these results for one second. It's quite plausible >>> that there's a noticeable performance benefit, but a factor of three is >>> completely unrealistic... Could you please recheck? >> A possible theory is that the hash change moved some locks into >> different partitions causing a large reduction in contention, >> but even then 3X seems unlikely. And of course if that was >> the mechanism, the result is still pure luck; other examples >> might get worse by the same amount. > I must have screwed something up, because if anything I see a small loss for XXH now (but my laptop isn't consistent enoughto be sure). > This surprises me given that SMHasher shows XXH to be 50% faster than > Spooky for 20 byte keys. The speed of the hash calculation itself is unlikely to move the needle as much as 1% either direction, because I've seldom seen any profile in which hash_any accounted for more than a percent or so of total runtime. What is far more likely to be causing visible performance changes is downstream effects, ie changes in the distribution of entries across hash buckets, causing changes in bucket list search times and/or changes in contention (for shared memory tables that are locked on a hash-partition basis). But even then it's hard to credit 3X. regards, tom lane
Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
From
Andres Freund
Date:
On 2014-12-24 13:58:41 -0600, Jim Nasby wrote: > This surprises me given that SMHasher shows XXH to be 50% faster than Spooky for 20 byte keys. Note that there's quite some differences in how smhasher and postgres use the hash functions. The former nearly exclusively exercises the hash function while postgres also executes a lot of other code. Which means that the instruction cache and branch prediction is much less likely to contain relevant entries. And that can change the performance characteristics noticeably. Additionally there's differences in how much pipelining can be employed - the likelihood the CPU is able to do so in tight loops is much higher. Also note that in many cases in which you can see tag_hash/hash_any in workloads a part of the cost won't actually be the computation of the hashes themselves but the cache misses triggered by the hash computation accessing the data. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services