Thread: Useless memoize path generated for unique join on primary keys
Hello, I have come across a plan that should never get generated IMHO: SELECT 1 FROM extdataregular e1 INNER JOIN extdataempty e2 ON e1.field = e2.field AND e1.index = e2.index generates the following plan: Nested Loop (cost=1.13..528540.89 rows=607604 width=4) (actual time=9298.504..9298.506 rows=0 loops=1) -> Index Only Scan using pk_extdataempty on extdataempty e2 (cost=0.56..157969.52 rows=4078988 width=16) (actual time=0.026..641.248rows=4067215 loops=1) Heap Fetches: 268828 -> Memoize (cost=0.58..0.67 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=4067215) Cache Key: e2.field, e2.index Cache Mode: logical Hits: 0 Misses: 4067215 Evictions: 3228355 Overflows: 0 Memory Usage: 65537kB Buffers: shared hit=16268863 -> Index Only Scan using pk_extdataregular on extdataregular e1 (cost=0.57..0.66 rows=1 width=16) (actual time=0.001..0.001rows=0 loops=4067215) Index Cond: ((field = e2.field) AND (index = e2.index)) Heap Fetches: 2 Please note that the memoize node has no cache hits, which is not surprising given that we are joining on two primary keysthat are unique by definition ("field" and "index" make up the primary key of both tables). Why would it ever make sense to generate a memoize plan for a unique join? I think this issue might tie in with the current discussion over on the hackers mailing list [1] Cheers, Ben [1] https://www.postgresql.org/message-id/flat/CAApHDvpFsSJAThNLtqaWvA7axQd-VOFct%3DFYQN5muJV-sYtXjw%40mail.gmail.com -- Bejamin Coutu ben.coutu@zeyos.com ZeyOS GmbH & Co. KG http://www.zeyos.com
On Tue, 3 May 2022 at 23:05, Benjamin Coutu <ben.coutu@zeyos.com> wrote: > -> Memoize (cost=0.58..0.67 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=4067215) > Cache Key: e2.field, e2.index > Cache Mode: logical > Hits: 0 Misses: 4067215 Evictions: 3228355 Overflows: 0 Memory Usage: 65537kB > Buffers: shared hit=16268863 > -> Index Only Scan using pk_extdataregular on extdataregular e1 (cost=0.57..0.66 rows=1 width=16) (actual time=0.001..0.001rows=0 loops=4067215) > Index Cond: ((field = e2.field) AND (index = e2.index)) > Why would it ever make sense to generate a memoize plan for a unique join? It wouldn't ever make sense. The problem is that estimate_num_groups() is used to estimate the number of distinct values and that function does not know about primary keys. There's no way the costing of Memoize would allow a Memoize plan to be used if it thought all values were unique, so the only possibility here is that ndistinct is being underestimated by some amount that makes Memoize look like the most favourable plan. You could see what the planner thinks about the ndistinct estimate on field, index by doing: EXPLAIN SELECT field,index FROM extdataregular GROUP BY 1,2; Whatever you see in the final row estimate for that plan is what's being fed into the Memoize costing code. > I think this issue might tie in with the current discussion over on the hackers mailing list [1] I'd say it's a pretty different problem. The cache hit ratio discussion on that thread talks about underestimating the hit ratio. That particular problem could only lead to Memoize plans *not* being chosen when they maybe should be. Not the other way around, which is your case. create statistics extdataregular_field_index_stats (ndistinct) on field, index from extdataregular; analyze extdataregular; would likely put that right. David
> I'd say it's a pretty different problem. The cache hit ratio > discussion on that thread talks about underestimating the hit ratio. > That particular problem could only lead to Memoize plans *not* being > chosen when they maybe should be. Not the other way around, which is > your case. > > create statistics extdataregular_field_index_stats (ndistinct) on > field, index from extdataregular; > analyze extdataregular; > > would likely put that right. Thanks David, using extended statistics for both (and only for both) tables solved this problem. BTW, thank you for all your work on performance in recent releases.
On Wed, 4 May 2022 at 00:21, Benjamin Coutu <ben.coutu@zeyos.com> wrote: > Thanks David, using extended statistics for both (and only for both) tables solved this problem. Oh, whoops. I did get that backwards. The estimate used by the Memoize costing code is from the outer side of the join, which is the extdataempty in this case. I don't think the extdataregular_field_index_stats will do anything. It'll be the ones you added on extdataempty that are making it work. > BTW, thank you for all your work on performance in recent releases. Thanks for the feedback :) David