Re: Safer hash table initialization macro - Mailing list pgsql-hackers

From Jelte Fennema-Nio
Subject Re: Safer hash table initialization macro
Date
Msg-id DFO6J99E4HOY.29NFF8O6W6TFH@jeltef.nl
Whole thread Raw
In response to Re: Safer hash table initialization macro  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: Safer hash table initialization macro
List pgsql-hackers
On Wed, 14 Jan 2026 at 08:57, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> + * WARNING: If you use char[N] to store binary data that is not null-terminated,
> + * the automatic detection will incorrectly treat it as a string and use string
> + * comparison. In such cases, use hash_make_ext() with .force_blobs = true to
> + * override the automatic detection
>
> maybe s/is not null-terminated/may contain null bytes/?

Changed wording and changed to NOTE.

> > Especially, because to make this macro nice to
> > use in the two cases that it would apply to we'd have to make it treat 0
> > as a special value.
>
> Not necessary, we could also just add a foreach_hash_with_hash_value() to make
> things more consistent?

It would be more consistent, but the calling code would actually be more
verbose in the two places where we use seq_init_hash_with_hash_value.
Because in both places the code calls it conditionally like this:

    if (hashvalue == 0)
        hash_seq_init(&status, TypeCacheHash);
    else
        hash_seq_init_with_hash_value(&status, TypeCacheHash, hashvalue);

    while ((typentry = (TypeCacheEntry *) hash_seq_search(&status)) != NULL)
    {

So if we don't have the foreach_hash_with_hash_value macro treat 0
special, then we'd need to replace this single while with two for
foreach loops. A foreach_hash for the 0 case and a
foreach_hash_with_hash_value for the non-zero one.


> -static void
> -cfunc_hashtable_init(void)
> -{
> -       /* don't allow double-initialization */
> -       Assert(cfunc_hashtable == NULL);
>
> Most of the hash_make_fn_cxt() callers check that the destination is already
> NULL so that removing the Assert() is not that worrying for them. There are 2
> places where it's not the case: InitializeAttoptCache() and build_guc_variables()
> , should we add an Assert (or an if check) in them? I think that an Assert is
> more appropriate for those 2.

I'm a bit confused about this comment. I didn't change anything for
those two places about their checks for NULL. Did you mean this as a
separate but related improvement to existing code?

(To be clear, the removed Assert that you quoted doesn't matter when
inlining, because it's the only item in an if (!cfunc_hashtable) block)

> === 2
>
> "    At the very least we should choose a few places where we use the new
>     macros to make sure they have coverage."
>
> I do agree that the refactoring is quite large. I do think there is no rush
> to do all the changes at once. We could update a subset of files at a time per
> month so that, that would:
>
> - keep changes consistent within each file
> - ease the review(s)
> - avoid "large" rebases for patches waiting in the commitfest
>
> Thoughts?

As long as the new macro definitions get in, any way a committer thinks
that's sensible is fine by me. e.g. The List foreach macros also haven't
been replaced in bulk, but I'm still very happy that I can use them for
new code.

Attachment

pgsql-hackers by date:

Previous
From: Marco Matteucci
Date:
Subject: Re: [RFC] SLIM Data Type - Compact JSON Alternative (17-62% smaller)
Next
From: Chao Li
Date:
Subject: Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication