Re: Postgres 16.1 - Bug: cache entry already complete - Mailing list pgsql-bugs
From | David Rowley |
---|---|
Subject | Re: Postgres 16.1 - Bug: cache entry already complete |
Date | |
Msg-id | CAApHDvoBTZzooXsH4hqtQ4cNJxuYZe=hakPcLSPZP9-NkfaSjQ@mail.gmail.com Whole thread Raw |
In response to | Re: Postgres 16.1 - Bug: cache entry already complete (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Postgres 16.1 - Bug: cache entry already complete
|
List | pgsql-bugs |
On Wed, 3 Jan 2024 at 13:40, David Rowley <dgrowleyml@gmail.com> wrote: > The problem is that singlerow is set entirely based on innerunique. > That's wrong because in this case, the unique properties include the > Join Filter, not just the parameterized qual. > > I think the fix should be in get_memoize_path(). What we pass to > create_memoize_path() for the "singlerow" parameter needs to be more > than just extra->inner_unique. The Join Filter must also be empty. It looks like we only figure out what will become the "Join Filter" in create_nestloop_path(), which is slightly too late as we create the MemoizePath before creating the NestLoopPath. In the attached, I drafted up some code that duplicates the logic in create_nestloop_path() to determine if there's going to be a joinrestrictinfo (Join Filter) and to set MemoizePath.singlerow to false if there is going to be one. My concern with that is that the Unique Join optimisation will cause execution to skip to the next outer row and that will leave us no means of marking the Memoize cache entry as complete. In singlerow==false Memoize nodes, we only mark the cache as complete when we read the inner node to completion. The unique join optimisation means we often don't do that due to skipping to the next outer row on finding the first inner match. Basically, what I'm saying is that Memoize is going to result in many more cache misses due to incomplete cache entries. Maybe we should have get_memoize_path() return NULL for this case so that we don't Memoize when there's a Join Filter and extra->inner_unique is set to true. With the attached, I see: QUERY PLAN ------------------------------------------------------------------------------------------ Nested Loop Left Join (actual rows=315 loops=1) Join Filter: (t3.t2_id = t2.id) Rows Removed by Join Filter: 1007 -> Nested Loop Left Join (actual rows=315 loops=1) -> Nested Loop Left Join (actual rows=315 loops=1) -> Seq Scan on t4 (actual rows=315 loops=1) -> Memoize (actual rows=1 loops=315) Cache Key: t4.t2_id Cache Mode: logical SingleRow: true Hits: 296 Misses: 19 Evictions: 0 Overflows: 0 Memory Usage: 3kB -> Index Only Scan using t2_pkey on t2 (actual rows=1 loops=19) Index Cond: (id = t4.t2_id) Heap Fetches: 18 -> Memoize (actual rows=1 loops=315) Cache Key: t4.t1_id Cache Mode: logical SingleRow: true Hits: 276 Misses: 39 Evictions: 0 Overflows: 0 Memory Usage: 5kB -> Index Only Scan using t1_pkey on t1 (actual rows=1 loops=39) Index Cond: (id = t4.t1_id) Heap Fetches: 38 -> Memoize (actual rows=4 loops=315) Cache Key: t1.id, t1.id Cache Mode: logical SingleRow: false Hits: 199 Misses: 116 Evictions: 0 Overflows: 0 Memory Usage: 15kB -> Index Scan using t3_t1_id_index on t3 (actual rows=3 loops=116) Index Cond: (t1_id = t1.id) Planning Time: 0.322 ms Execution Time: 3654.894 ms (31 rows) (I just added SingleRow to explain to make it easier to see what's going on) Notice the Cache Misses is 116 for the problem Memoize node. There are only 77 distinct values for t1_id in t3. So we're certainly classing cache entries as invalid due to them being complete==false. David
Attachment
pgsql-bugs by date: