On 10/8/25 21:16, Melanie Plageman wrote:
> On Wed, Oct 8, 2025 at 1:37 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
>>
>> I have updated my patch to fix the mistakes above. I also noticed then
>> that I wasn't doubling space_allowed in the loop but instead setting
>> it to hash_table_bytes at the end. This doesn't produce a power of 2
>> because we subtract skew_mcvs from the hash_table_bytes. So, we have
>> to keep using space_allowed if we want a power of 2 in the end.
>>
>> I've changed my patch to do this, but this made me wonder if we want
>> to be doing this or instead take hash_table_bytes at the end and round
>> it up to a power of 2 and set space_allowed to that. If the skew
>> hashtable is large, we may be allocating way more space_allowed than
>> we need for new hash_table_bytes + skew hashtable buckets.
>
I don't think there's any promise hash_table_bytes being a power of 2.
You can make hash_table_bytes an almost arbitrary value by setting
work_mem and hash_mem_multiplier. Or am I missing something?
But you're right hash_table_bytes and space_allowed may not be equal if
useskew=true. So setting space_allowed to hash_table_bytes at the end
does not seem right. I think we don't actually need hash_table_bytes at
this point, we can just ignore it, and use/double *space_allowed.
I kept using hash_table_bytes mostly because it didn't require the
pointer dereferencing, but I failed to consider the useskew=true thing.
However, this means there's probably a bug - the loop should probably
double num_skew_mcvs too. We simply reserve SKEW_HASH_MEM_PERCENT of
space_allowed for skew hashtable, so should we adjust it the same way?
> Oh wait, that doesn't make sense because each batch could have a skew hashtable.
>
Not sure I understand. Is this the same issue I just described?
regards
--
Tomas Vondra