Re: Fix overflow of nbatch - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Fix overflow of nbatch
Date
Msg-id 72a99668-07f3-4163-9302-5d53e3bbff58@vondra.me
Whole thread Raw
In response to Re: Fix overflow of nbatch  (Tomas Vondra <tomas@vondra.me>)
List pgsql-hackers
On 10/9/25 16:30, Tomas Vondra wrote:
> On 10/9/25 16:16, Melanie Plageman wrote:
>> On Wed, Oct 8, 2025 at 4:46 PM Tomas Vondra <tomas@vondra.me> wrote:
>>>
>>> On 10/8/25 19:37, Melanie Plageman wrote:
>>>
>>>> Yes, this was wrong. I forgot an extra / sizeof(HashJoinTuple). I meant:
>>>>
>>>> hash_table_bytes / sizeof(HashJoinTuple) > MaxAllocSize /
>>>> sizeof(HashJoinTuple) / 2
>>>
>>> But the hash table is not allocated as a single chunk of memory, so I
>>> think MaxAllocSize would be the wrong thing to use here, no? Hash table
>>> is a collection of tuples (allocated one by one), that's why
>>> get_hash_memory_limit() uses SIZE_MAX. MaxAllocSize matters for
>>> nbuckets, because that indeed is allocated as a contiguous array, ofc.
>>>
>>> Also, why bother with /sizeof(HashJoinTuple) on both sides? Without it
>>> we get
>>>
>>>     hash_table_bytes > MaxAllocSize / 2
>>>
>>> but again, that doesn't make much sense - the hash table can be larger,
>>> it's not a single palloc.
>>
>> It came from the earlier clamping of nbuckets:
>>
>>     max_pointers = hash_table_bytes / sizeof(HashJoinTuple);
>>     max_pointers = Min(max_pointers, MaxAllocSize / sizeof(HashJoinTuple));
>>     dbuckets = Min(dbuckets, max_pointers);
>>
>> I don't really get why this divides hash_table_bytes by
>> sizeof(HashJoinTuple) since, as you say, hash_table_bytes is supposed
>> to accommodate both the bucket_bytes and inner_rel_bytes.
>>
> 
> I think that's simply to allocate enough buckets for the the expected
> number of tuples, not more. And cap it to MaxAllocSize. Some of this may
> be redundant, I suppose.
> 
>>> And I think it should be just
>>>
>>>     if (hash_table_bytes > SIZE_MAX / 2)
>>>         break;
>>>
>>> as a protection against hash_table_bytes overflowing SIZE_MAX (which on
>>> 64-bits seems implausible, but could happen on 32-bit builds).
>>
>> That's roughly the check I ended up with -- except I used
>> space_allowed because it will be larger than hash_table_bytes if there
>> is a skew hashtable. I've updated the comment and such in attached v3.
>>
> 
> Ah, thanks. I was just hacking on this too, but I'll switch to your v3.
> 

Here's a v4, which is your v3 with a couple minor adjustments:

1) A couple comments adjusted. It feels a bit too audacious to correct
comments written by native speaker, but it seems cleaner to me like this.

2) I think the (nbatch < hash_table_bytes / BLCKSZ) condition really
needs to check space_allowed, because that's the whole space, before
subtracting the skew buckets. And we also check the overflow for
space_allowed earlier, not hash_table_bytes.

3) removes the hash_table_bytes doubling, because it's not needed by the
loop anymore (or after that)

4) added the doubling of num_skew_mcvs, and also the overflow protection
for that

You suggested this in another message:

> We do need to recalculate num_skew_mcvs once at the end before
> returning.

But I think the doubling has the same effect, right? I don't want to
redo the whole "if (useskew) { ... }" block at the end.

I wonder if maybe it'd be better to just merge all the overflow checks
into a single "if (a || b || c)" check. These separate checks seem quite
verbose. I'll see tomorrow.

I've done a bit more testing today. I went a bit overboard and created a
table with 20B rows, which actually pushes nbatch high enough to hit the
initial issue (overflow in the size calculation). And it works sanely
with the v4 patch (and v3 too).

I guess I could have tweaked the table stats, or just manipulate the
values at the beginning of the function. But that wouldn't be so fun.


regards

-- 
Tomas Vondra

Attachment

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE
Next
From: Masahiko Sawada
Date:
Subject: Re: Support getrandom() for pg_strong_random() source