Re: [PATCH]-hash index improving - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [PATCH]-hash index improving |
Date | |
Msg-id | 23649.1216398205@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [PATCH]-hash index improving ("Jonah H. Harris" <jonah.harris@gmail.com>) |
Responses |
Re: [PATCH]-hash index improving
|
List | pgsql-hackers |
"Jonah H. Harris" <jonah.harris@gmail.com> writes: > Agreed. My thinking is that there's either something inherently wrong > with the implementation, or we're performing so many disk I/Os that > it's nearly equivalent to b-tree. Tom has a couple suggestions which > Xiao and I will explore. I finally got a chance to look through the patch in some detail. If I haven't missed something, there are just two improvement ideas embodied in it: 1. Store just the hash value, and not the index key value, in hash index entries. (This means that all hash indexscans become lossy and have to be rechecked at the heap.) 2. Keep the contents of each index page ordered by hash value, and use binary instead of linear search to find the matching item(s) during an indexscan. (This depends on #1 because recomputing the hash values during the binary search would be too expensive --- although you could also make it work by storing *both* the hash value and the original key.) I suppose that the main point of #1 is to reduce index size by allowing more tuples to fit in each bucket. However, the patch neglects to adjust the target-fillfactor calculation in _hash_metapinit, which means that the code won't actually put any more tuples per bucket (on average) than it did before. So the index isn't actually any smaller and you get no I/O savings --- you just have more unused space on a typical page. Fixing that might help. FWIW, I had always assumed that part of the solution to hash's woes would involve decoupling the bucket size from the page size, so that you could have multiple buckets per page. But maybe the binary-search idea makes that unnecessary. I'm not sure. A whole lot depends on how evenly the buckets get filled. You should probably investigate how many tuples actually end up in each bucket with and without the patch. In the realm of micro-optimizations that might be significant, I think you really need to get rid of all those _create_hash_desc calls, particularly the one in _hash_checkqual which is part of the inner loop of an indexscan. Not only are they slow but they probably represent a serious memory leak in a scan that returns many tuples. For reading the hash value out of an existing index tuple, I don't think you should be bothering with a tupdesc at all --- don't use index_getattr, just map a C struct onto the known layout of a indextuple with a single never-null int field. This would particularly make for a noticeable improvement in the speed of _hash_binsearch. The tupdescs used in storing an index entry are probably needed, but you could just use a single statically declared tupdesc (look at the ones in relcache.c for examples of building one as a constant). regards, tom lane
pgsql-hackers by date: