Re: Use merge-based matching for MCVs in eqjoinsel - Mailing list pgsql-hackers
From | David Geier |
---|---|
Subject | Re: Use merge-based matching for MCVs in eqjoinsel |
Date | |
Msg-id | 412007b6-a324-44f5-91ed-c869004854ec@gmail.com Whole thread Raw |
In response to | Re: Use merge-based matching for MCVs in eqjoinsel (Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>) |
Responses |
Re: Use merge-based matching for MCVs in eqjoinsel
|
List | pgsql-hackers |
Hi Ilia! > I have read all the previous messages - and yes, you are right. I don’t > know why I didn’t consider using a hash table approach initially. Your > idea makes a lot of sense. Your solution would be beneficial on top, for cases where the data type is not hashable. But I think that's overkill for a v1. I would start with the hash-based version. > To evaluate it, I ran benchmarks on JOB with three variants: > > $ ./benchmark.sh master > $ ./benchmark.sh merge > $ ./benchmark.sh hash > > I compared total planning time across all 113 queries. Was this running with optimizations? How did you extract the planning time? > > $ python3 planning_time.py master hash > default_statistics_target | Planner Speedup (×) | Planner Before (ms) | > Planner After (ms) > -------------------------------------------------------------------------------- > 100 | 1.00 | 1892.627 | > 1886.969 > 1000 | 1.09 | 2286.922 | > 2100.099 > 2500 | 1.94 | 4647.167 | > 2400.711 > 5000 | 6.15 | 17964.779 | > 2919.914 > 7500 | 10.58 | 38622.443 | > 3650.375 > 10000 | 16.33 | 69538.085 | > 4257.864 > > $ python3 planning_time.py master merge > default_statistics_target | Planner Speedup (×) | Planner Before (ms) | > Planner After (ms) > -------------------------------------------------------------------------------- > 100 | 1.00 | 1892.627 | > 1898.622 > 1000 | 1.12 | 2286.922 | > 2033.553 > 2500 | 1.92 | 4647.167 | > 2423.552 > 5000 | 5.94 | 17964.779 | > 3025.739 > 7500 | 10.48 | 38622.443 | > 3684.262 > 10000 | 16.72 | 69538.085 | > 4159.418 I would have expected the delta between the "merge" and "hash" variant to be bigger, especially for default_statistics_target=10000. My small test also showed that. Any idea why this is not showing in your results? > Based on these results, I’d prefer the hash lookup implementation, so I > think it makes sense to improve your patch further and bring it into > good shape. Shall I take care of that, or would you prefer to do it > yourself? I think process-wise it's best if you review my code and I do the changes. Could you as part of your review test tables with just a few MCVs to make sure we're not regressing "small" cases? For now I'm only bailing if one of the two MCV lists has just a single value. I'm expecting the gains from fine tuning this value to be not measurable but let's double check. -- David Geier
pgsql-hackers by date: