Re: A performance issue with Memoize - Mailing list pgsql-hackers
From | Andrei Lepikhov |
---|---|
Subject | Re: A performance issue with Memoize |
Date | |
Msg-id | 1d55df6d-92cb-4bbd-9a4c-0d05bb6bc91d@postgrespro.ru Whole thread Raw |
In response to | A performance issue with Memoize (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: A performance issue with Memoize
|
List | pgsql-hackers |
On 20/10/2023 17:40, Richard Guo wrote: > I noticed $subject with the query below. > > set enable_memoize to off; > > explain (analyze, costs off) > select * from tenk1 t1 left join lateral > (select t1.two as t1two, * from tenk1 t2 offset 0) s > on t1.two = s.two; > QUERY PLAN > ------------------------------------------------------------------------------------ > Nested Loop Left Join (actual time=0.050..59578.053 rows=50000000 loops=1) > -> Seq Scan on tenk1 t1 (actual time=0.027..2.703 rows=10000 loops=1) > -> Subquery Scan on s (actual time=0.004..4.819 rows=5000 loops=10000) > Filter: (t1.two = s.two) > Rows Removed by Filter: 5000 > -> Seq Scan on tenk1 t2 (actual time=0.002..3.834 rows=10000 > loops=10000) > Planning Time: 0.666 ms > Execution Time: 60937.899 ms > (8 rows) > > set enable_memoize to on; > > explain (analyze, costs off) > select * from tenk1 t1 left join lateral > (select t1.two as t1two, * from tenk1 t2 offset 0) s > on t1.two = s.two; > QUERY PLAN > ------------------------------------------------------------------------------------------ > Nested Loop Left Join (actual time=0.061..122684.607 rows=50000000 > loops=1) > -> Seq Scan on tenk1 t1 (actual time=0.026..3.367 rows=10000 loops=1) > -> Memoize (actual time=0.011..9.821 rows=5000 loops=10000) > Cache Key: t1.two, t1.two > Cache Mode: binary > Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0 Memory > Usage: 1368kB > -> Subquery Scan on s (actual time=0.008..5.188 rows=5000 > loops=10000) > Filter: (t1.two = s.two) > Rows Removed by Filter: 5000 > -> Seq Scan on tenk1 t2 (actual time=0.004..4.081 > rows=10000 loops=10000) > Planning Time: 0.607 ms > Execution Time: 124431.388 ms > (12 rows) > > The execution time (best of 3) is 124431.388 VS 60937.899 with and > without memoize. > > The Memoize runtime stats 'Hits: 0 Misses: 10000 Evictions: 9999' > seems suspicious to me, so I've looked into it a little bit, and found > that the MemoizeState's keyparamids and its outerPlan's chgParam are > always different, and that makes us have to purge the entire cache each > time we rescan the memoize node. > > But why are they always different? Well, for the query above, we have > two NestLoopParam nodes, one (with paramno 1) is created when we replace > outer-relation Vars in the scan qual 't1.two = s.two', the other one > (with paramno 0) is added from the subquery's subplan_params, which was > created when we replaced uplevel vars with Param nodes for the subquery. > That is to say, the chgParam would be {0, 1}. > > When it comes to replace outer-relation Vars in the memoize keys, the > two 't1.two' Vars are both replaced with the NestLoopParam with paramno > 1, because it is the first NLP we see in root->curOuterParams that is > equal to the Vars in memoize keys. That is to say, the memoize node's > keyparamids is {1}. > ... > Any thoughts? Do you've thought about the case, fixed with the commit 1db5667? As I see, that bugfix still isn't covered by regression tests. Could your approach of a PARAM_EXEC slot reusing break that case? -- regards, Andrei Lepikhov Postgres Professional
pgsql-hackers by date: