Thread: Comment simplehash/dynahash trade-offs
In another thread [1] I'd mused that "there might be some value in a README or comments addition that would be a guide to what the various hash implementations are useful for...so that we have something to make the code base a bit more discoverable." I'd solicited feedback from Andres (as the author of the simplehash implementation) and gotten further explanation from Tomas (both cc'd here) and have tried to condense that into the comment changes in this patch series. v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch Contains the summaries mentioned above. v1-0002-Improve-simplehash-usage-notes.patch I'd noticed while adding a simplehash implementation in that other thread that the facts that: - The element type requires a status member. - Why storing a hash in the element type is useful (i.e., when to define SH_STORE_HASH/SH_GET_HASH). - The availability of private_data member for metadata access from callbacks. are either undocumented or hard to discover, and so I've added the information directly to the usage notes section. v1-0003-Show-sample-simplehash-method-signatures.patch I find it hard to read the macro code "templating" particularly for seeing what the available API is and so added sample method signatures in comments to the macro generated method signature defines. James [1]: https://www.postgresql.org/message-id/CAAaqYe956a-zbm1qR8pqz%3DiLbF8LW5vBrQGrzXVHXdLk3at5_Q%40mail.gmail.com
Attachment
On Fri, May 1, 2020 at 1:53 PM James Coleman <jtc331@gmail.com> wrote: > In another thread [1] I'd mused that "there might be some value in a > README or comments > addition that would be a guide to what the various hash > implementations are useful for...so that we have something to make the > code base a bit more discoverable." +1 > I'd solicited feedback from Andres (as the author of the simplehash > implementation) and gotten further explanation from Tomas (both cc'd > here) and have tried to condense that into the comment changes in this > patch series. > > v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch > Contains the summaries mentioned above. + * - It supports partitioning, which is useful for shared memory access using I wonder if we should say a bit more about the shared memory mode. Shared memory dynahash tables are allocated in a fixed size area at startup, and are discoverable by name in other other processes that need to get access to them, while simplehash assumes that it can get memory from a MemoryContext or an allocator with a malloc/free-style interface, which isn't very well suited for use in shared memory. (I'm sure you can convince it to work in shared memory with some work.) > v1-0002-Improve-simplehash-usage-notes.patch + * For convenience the hash table create functions accept a void pointer + * will be stored in the hash table type's member private_data. *that* will be stored? > v1-0003-Show-sample-simplehash-method-signatures.patch > I find it hard to read the macro code "templating" particularly for > seeing what the available API is and so added sample method signatures > in comments to the macro generated method signature defines. I didn't double-check all the expansions of the macros but +1 for this idea, it's very useful.
On Mon, Jul 20, 2020 at 1:29 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Fri, May 1, 2020 at 1:53 PM James Coleman <jtc331@gmail.com> wrote: > > In another thread [1] I'd mused that "there might be some value in a > > README or comments > > addition that would be a guide to what the various hash > > implementations are useful for...so that we have something to make the > > code base a bit more discoverable." > > +1 > > > I'd solicited feedback from Andres (as the author of the simplehash > > implementation) and gotten further explanation from Tomas (both cc'd > > here) and have tried to condense that into the comment changes in this > > patch series. > > > > v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch > > Contains the summaries mentioned above. > > + * - It supports partitioning, which is useful for shared memory access using > > I wonder if we should say a bit more about the shared memory mode. > Shared memory dynahash tables are allocated in a fixed size area at > startup, and are discoverable by name in other other processes that > need to get access to them, while simplehash assumes that it can get > memory from a MemoryContext or an allocator with a malloc/free-style > interface, which isn't very well suited for use in shared memory. > (I'm sure you can convince it to work in shared memory with some > work.) Added. > > v1-0002-Improve-simplehash-usage-notes.patch > > + * For convenience the hash table create functions accept a void pointer > + * will be stored in the hash table type's member private_data. > > *that* will be stored? Fixed. > > v1-0003-Show-sample-simplehash-method-signatures.patch > > I find it hard to read the macro code "templating" particularly for > > seeing what the available API is and so added sample method signatures > > in comments to the macro generated method signature defines. > > I didn't double-check all the expansions of the macros but +1 for this > idea, it's very useful. James
Attachment
On Sat, Aug 1, 2020 at 7:22 AM James Coleman <jtc331@gmail.com> wrote: > [v2 patch set] I ran it through pgindent which insisted on adding some newlines, I manually replaced some spaces with tabs to match nearby lines, I added some asterisks in your example function prototypes where <element> is returned because they seemed to be missing, and I pushed this. Thanks!
On Fri, Jul 31, 2020 at 8:17 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Sat, Aug 1, 2020 at 7:22 AM James Coleman <jtc331@gmail.com> wrote: > > [v2 patch set] > > I ran it through pgindent which insisted on adding some newlines, I > manually replaced some spaces with tabs to match nearby lines, I added > some asterisks in your example function prototypes where <element> is > returned because they seemed to be missing, and I pushed this. > Thanks! Thanks for reviewing and committing! James
On Sat, 1 Aug 2020 at 12:17, Thomas Munro <thomas.munro@gmail.com> wrote: > > On Sat, Aug 1, 2020 at 7:22 AM James Coleman <jtc331@gmail.com> wrote: > > [v2 patch set] > > I ran it through pgindent which insisted on adding some newlines, I > manually replaced some spaces with tabs to match nearby lines, I added > some asterisks in your example function prototypes where <element> is > returned because they seemed to be missing, and I pushed this. > Thanks! I was just reading over this and wondered about the following: + * The element type is required to contain a "uint32 status" member. I see that PagetableEntry does not follow this and I also didn't follow it when writing the Result Cache patch in [1]. I managed to shrink the struct I was using for the hash table by 4 bytes by using a char instead of an int. That sounds like a small amount of memory, but it did result in much better cache hit ratios in the patch Maybe it would be better just to get rid of the enum and just #define the values. It seems unlikely that we're every going to need many more states than what are there already, let along more than, say 127 of them. It does look like manifest_file could be shrunk down a bit too by making the status field a char. [1] https://www.postgresql.org/message-id/flat/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com
On Sun, 2 Aug 2020 at 11:16, David Rowley <dgrowleyml@gmail.com> wrote: > Maybe it would be better just to get rid of the enum and just #define > the values. It seems unlikely that we're every going to need many more > states than what are there already, let along more than, say 127 of > them. It does look like manifest_file could be shrunk down a bit too > by making the status field a char. This didn't turn out quite as pretty as I had imagined. I needed to leave the two statuses defined in simplehash.h so that callers could make use of them. (Result Cache will do this). The change here would be callers would need to use SH_STATUS_IN_USE rather than <prefix>_SH_IN_USE. I'm not really that sold on doing things this way. I really just don't want to have to make my status field a uint32 in Result Cache per what the new comment states we must do. If there's a nicer way, then perhaps that's worth considering. David
Attachment
On Sun, Aug 2, 2020 at 1:54 PM David Rowley <dgrowleyml@gmail.com> wrote: > On Sun, 2 Aug 2020 at 11:16, David Rowley <dgrowleyml@gmail.com> wrote: > > Maybe it would be better just to get rid of the enum and just #define > > the values. It seems unlikely that we're every going to need many more > > states than what are there already, let along more than, say 127 of > > them. It does look like manifest_file could be shrunk down a bit too > > by making the status field a char. > > This didn't turn out quite as pretty as I had imagined. I needed to > leave the two statuses defined in simplehash.h so that callers could > make use of them. (Result Cache will do this). > > The change here would be callers would need to use SH_STATUS_IN_USE > rather than <prefix>_SH_IN_USE. > > I'm not really that sold on doing things this way. I really just don't > want to have to make my status field a uint32 in Result Cache per what > the new comment states we must do. If there's a nicer way, then > perhaps that's worth considering. Agreed that the new comment is wrong and should be changed. I think you can probably go further, though, and make it require no storage at all by making it optionally "intrusive", by using a special value in an existing member, and supplying an expression to set and test for that value. For example, maybe some users have a pointer but never want to use NULL, and maybe some users already have a field holding various flags that are one bit wide and can spare a bit.
On Mon, 3 Aug 2020 at 11:10, Thomas Munro <thomas.munro@gmail.com> wrote: > I think you can probably go further, though, and make it require no > storage at all by making it optionally "intrusive", by using a special > value in an existing member, and supplying an expression to set and > test for that value. For example, maybe some users have a pointer but > never want to use NULL, and maybe some users already have a field > holding various flags that are one bit wide and can spare a bit. I agree that it would be good to allow users of simplehash.h that additional flexibility. It may allow additional memory savings. However, it would mean we'd need to do some additional work when we create and grow the hash table to ensure that we mark new buckets as empty. For now, we get that for free with the zeroing of the newly allocated memory, but we couldn't rely on that if we allowed users to define their own macro. It looks like none of the current callers could gain from this 1. TupleHashEntryData does not have any reusable fields. The status should fit in the padding on a 64-bit machine anyway. 2. PagetableEntry already has a status that fits into the padding. 3. manifest_file could have its status moved to the end of the struct and made into a char and the struct would be the same size as if the field did not exist. So, with the current users, we'd stand to lose more than we'd gain from doing it that way. David
On Mon, 3 Aug 2020 at 11:36, David Rowley <dgrowleyml@gmail.com> wrote: > So, with the current users, we'd stand to lose more than we'd gain > from doing it that way. FWIW, I'd be ok with just: - * The element type is required to contain a "uint32 status" member. + * The element type is required to contain an integer-based "status" member + * which can store the range of values defined in the SH_STATUS enum. David
On Mon, Aug 3, 2020 at 11:42 AM David Rowley <dgrowleyml@gmail.com> wrote: > On Mon, 3 Aug 2020 at 11:36, David Rowley <dgrowleyml@gmail.com> wrote: > > So, with the current users, we'd stand to lose more than we'd gain > > from doing it that way. > > FWIW, I'd be ok with just: > > - * The element type is required to contain a "uint32 status" member. > + * The element type is required to contain an integer-based > "status" member > + * which can store the range of values defined in the SH_STATUS enum. Thanks for the correction. Pushed.