Re: Use merge-based matching for MCVs in eqjoinsel - Mailing list pgsql-hackers

From David Geier
Subject Re: Use merge-based matching for MCVs in eqjoinsel
Date
Msg-id 1f6e18c6-9149-4b00-9837-fb8ce5304cf4@gmail.com
Whole thread Raw
In response to Re: Use merge-based matching for MCVs in eqjoinsel  (Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>)
List pgsql-hackers
Hi!

Thanks for the review.

On 09.09.2025 09:29, Ilia Evdokimov wrote:
> From an architectural perspective, I think the patch is already in good
> shape. However, I have some suggestions regarding code style:
> 
> 1. I would move McvHashEntry, McvHashContext, he new hash table
>    definition, hash_mcv and are_mcvs_equal to the top.
Done. I've also moved up try_eqjoinsel_with_hashtable().

> 2. I’m not sure get_hash_func_oid() is needed at all – it seems we
>    could do without it.
Removed.

> 3. It would be better to name the parameters matchProductFrequencies ->
>    matchprodfreq, nMatches -> nmatches, hasMatch1 -> hasmatch1,
>    hasMatch2 -> hasmatch2 in eqjoinsel_inner_with_hashtable().
Done.

> 4. As I wrote earlier, since we now have a dedicated function
>    eqjoinsel_inner_with_hashtable(), perhaps it could be used in both
>    eqjoinsel_inner() and eqjoinsel_semi(). And since the hash-based
Done.

The gains for SEMI join are even bigger because the function is called 3
times for e.g. EXPLAIN SELECT * FROM foo f WHERE EXISTS (SELECT 1 FROM
bar b where f.col = b.col); For that query the planning time for me goes
from ~1300 ms -> 12 ms.

>    search was factored out, maybe it would make sense to also factor
>    out the O(N^2) nested loop implementation?
Generally agreed and while tempting, I've refrained from doing the
refactoring. Let's better do this in a separate patch to keep things simple.

> 5. I think it would be helpful to add a comment explaining that using a
>    hash table is not efficient when the MCV array length equals 1:
> 
> if (Min(statsSlot1->nvalues, statsSlot2->nvalues) == 1)
>     return false;
Done.
>> Did you have a
>> chance to check tables with just few MCVs or are there any queries in
>> the JOB which regress with very small default_statistics_target?
> 
> 
> Sure. I need some time to check this.
> 
Could you please do that with the latest attached patch so that we test
it once more?

Could you also run once more the JOB benchmark to get some test coverage
on the SEMI join code (assuming it also uses SEMI joins)?

Once we've concluded on above and there are no objections, I will
register this patch in the commit fest.

--
David Geier
Attachment

pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: Eager aggregation, take 3
Next
From: shveta malik
Date:
Subject: Re: Issue with logical replication slot during switchover