Re: Use merge-based matching for MCVs in eqjoinsel - Mailing list pgsql-hackers

From Ilia Evdokimov
Subject Re: Use merge-based matching for MCVs in eqjoinsel
Date
Msg-id 16ce0876-cd6b-4136-9ce9-21381de9c916@tantorlabs.com
Whole thread Raw
In response to Re: Use merge-based matching for MCVs in eqjoinsel  (David Geier <geidav.pg@gmail.com>)
Responses Re: Use merge-based matching for MCVs in eqjoinsel
List pgsql-hackers


On 08.09.2025 13:56, David Geier wrote:
To evaluate it, I ran benchmarks on JOB with three variants:

$ ./benchmark.sh master
$ ./benchmark.sh merge
$ ./benchmark.sh hash

I compared total planning time across all 113 queries.
Was this running with optimizations? How did you extract the planning time?


I save all query plans using EXPLAIN SUMMARY, then go through all the plans, read the 'Planning Time' for each, and sum them up.

I would have expected the delta between the "merge" and "hash" variant
to be bigger, especially for default_statistics_target=10000. My small
test also showed that. Any idea why this is not showing in your results?


So would I. With default_statistics_target = 10000 and the selectivity in the JOB queries being close to zero, the difference should be noticeable. I can only explain the previous results by cache-related effects on my machine.

I reran the benchmark on a clean cluster and collected the top slowest JOB queries — now the effect is clearly visible.

Merge (sum of all JOB queries)
==================
default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1888.105            | 1879.431
1000                      | 1.14                | 2282.239            | 2009.114
2500                      | 2.10                | 5595.030            | 2668.530
5000                      | 5.56                | 18544.933           | 3333.252
7500                      | 9.17                | 37390.956           | 4076.390
10000                     | 16.10               | 69319.479           | 4306.417

HashMap (sum of all JOB queries)
==================
default_statistics_target | Planner Speedup (×) | Planner Before (ms) | Planner After (ms)
--------------------------------------------------------------------------------
100                     | 1.03                | 1888.105            | 1828.088
1000                    | 1.18                | 2282.239            | 1939.884
2500                    | 2.64                | 5595.030            | 2117.872
5000                    | 7.80                | 18544.933           | 2377.206
7500                    | 13.80               | 37390.956           | 2709.973
10000                   | 23.32               | 69319.479           | 2973.073

Top 10 slowest JOB queries (default_statistics_target = 10000)
Query | master (ms) | merge (ms) | Hash (ms)
------+-------------+------------+-----------
29c   | 1904.586    | 144.135    | 100.473
29b   | 1881.392    | 117.891    | 89.028
29a   | 1868.805    | 112.242    | 83.913
31c   | 1867.234    | 76.498     | 56.140
30c   | 1646.630    | 88.494     | 62.549
30b   | 1608.820    | 84.821     | 64.603
31a   | 1573.964    | 75.978     | 56.140
28a   | 1457.738    | 95.939     | 77.309
28b   | 1455.052    | 99.383     | 73.065
30a   | 1416.699    | 91.057     | 62.549


BTW, the hashmap from your patch could also be applied to eqjoinsel_semi() function.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

pgsql-hackers by date:

Previous
From: Robert Treat
Date:
Subject: Re: [PATCH] Proposal to Enable/Disable Index using ALTER INDEX
Next
From: Robert Haas
Date:
Subject: Re: plan shape work