Re: Use merge-based matching for MCVs in eqjoinsel - Mailing list pgsql-hackers

From David Geier
Subject Re: Use merge-based matching for MCVs in eqjoinsel
Date
Msg-id 9f328076-423f-40d3-8012-2d2acc05cbff@gmail.com
Whole thread Raw
In response to Re: Use merge-based matching for MCVs in eqjoinsel  (Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>)
Responses Re: Use merge-based matching for MCVs in eqjoinsel
List pgsql-hackers
Hi!

On 08.09.2025 15:45, Ilia Evdokimov wrote:
> I reran the benchmark on a clean cluster and collected the top slowest
> JOB queries — now the effect is clearly visible.
> 
> Merge (sum of all JOB queries)
> ==================
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                       | *1.00*                | 1888.105           
> | 1879.431
> 1000                      | *1.14*                | 2282.239           
> | 2009.114
> 2500                      | *2.10*                | 5595.030           
> | 2668.530
> 5000                      | *5.56*                | 18544.933          
> | 3333.252
> 7500                      | *9.17*                | 37390.956          
> | 4076.390
> 10000                     | *16.10*               | 69319.479          
> | 4306.417
> 
> HashMap (sum of all JOB queries)
> ==================
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                     | *1.03*                | 1888.105            |
> 1828.088
> 1000                    | *1.18*                | 2282.239            |
> 1939.884
> 2500                    | *2.64*                | 5595.030            |
> 2117.872
> 5000                    | *7.80*                | 18544.933           |
> 2377.206
> 7500                    | *13.80*               | 37390.956           |
> 2709.973
> 10000                   | *23.32*               | 69319.479           |
> 2973.073
> 
> Top 10 slowest JOB queries (default_statistics_target = 10000)
> Query | master (ms) | merge (ms) | Hash (ms)
> ------+-------------+------------+-----------
> 29c   | 1904.586    | 144.135    | 100.473
> 29b   | 1881.392    | 117.891    | 89.028
> 29a   | 1868.805    | 112.242    | 83.913
> 31c   | 1867.234    | 76.498     | 56.140
> 30c   | 1646.630    | 88.494     | 62.549
> 30b   | 1608.820    | 84.821     | 64.603
> 31a   | 1573.964    | 75.978     | 56.140
> 28a   | 1457.738    | 95.939     | 77.309
> 28b   | 1455.052    | 99.383     | 73.065
> 30a   | 1416.699    | 91.057     | 62.549

This looks much better. Very nice!

> 
> BTW, the hashmap from your patch could also be applied to
> eqjoinsel_semi() function.
> 

Yep. The inner loop only runs until clamped_nvalues2 and it doesn't
compute matchprodfreq. I'll try to modify the patch such that it
accounts for these differences without being too hard to read.

Do you think anything else needs changes in the patch? Did you have a
chance to check tables with just few MCVs or are there any queries in
the JOB which regress with very small default_statistics_target?

--
David Geier



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: plan shape work
Next
From: "Matheus Alcantara"
Date:
Subject: Re: Only one version can be installed when using extension_control_path