Thread: Volatile functions under Memoize node
Hi, Excuse me if I made noise in vain. After discovering the limits of the Memoize node, I realized that volatile functions are allowed under the Memoize. Example: DROP TABLE IF EXISTS base, other CASCADE; CREATE TABLE base (x numeric, y text, x1 integer); INSERT INTO base (x,y,x1) SELECT 1, 'abccccccccccccc'||1,1 FROM generate_series(1,1E6) AS x; CREATE TABLE other (x numeric, y text, x1 integer); INSERT INTO other (x,y,x1) SELECT x, 'abccccccccccccc'||x,1 FROM generate_series(1,1E4) AS x; VACUUM ANALYZE base, other; EXPLAIN (COSTS OFF) SELECT * FROM base WHERE base.x IN ( SELECT o2.x FROM other o1 LEFT JOIN other o2 ON (o1.x=o2.x) LEFT JOIN other o3 ON (o2.x=o3.x+random()) WHERE base.x = o1.x GROUP BY o2.x ORDER BY o2.x ); /* Nested Loop -> Seq Scan on base -> Memoize Cache Key: base.x Cache Mode: binary -> Subquery Scan on "ANY_subquery" Filter: (base.x = "ANY_subquery".x) -> Group Group Key: o2.x -> Sort Sort Key: o2.x -> Nested Loop Left Join Join Filter: ((o2.x)::double precision = ((o3.x)::double precision + random())) -> Nested Loop Left Join -> Seq Scan on other o1 Filter: (base.x = x) -> Seq Scan on other o2 Filter: (x = base.x) -> Seq Scan on other o3 */ In my mind, any volatile function in any expression should reject the usage of Memoize, am I wrong? I haven't discovered this case deeply yet, but maybe someone has a quick and short answer to this question. -- regards, Andrei Lepikhov
On Fri, 20 Sept 2024 at 04:47, Andrei Lepikhov <lepihov@gmail.com> wrote: > Excuse me if I made noise in vain. After discovering the limits of the > Memoize node, I realized that volatile functions are allowed under the > Memoize. Example: I don't think we're particularly consistent about the number of evaluations of volatile functions in general. Here's something crafted up without Memoize: create function volatilefunc(p int) returns int as $$ begin raise notice '%', p; return p; end; $$ language plpgsql volatile; explain analyze select * from (values(1),(2)) t1(v) where t1.v in(select t2.v from (values(1),(2)) t2(v) inner join (values(1),(2)) t3(v) on t2.v = volatilefunc(t3.v)); Normally: NOTICE: 1 NOTICE: 2 NOTICE: 1 NOTICE: 2 But: set enable_hashagg = 0; set enable_hashjoin = 0; set enable_material = 0; set enable_sort = 0; gives: NOTICE: 1 NOTICE: 1 NOTICE: 2 NOTICE: 1 NOTICE: 2 I'm not sure if it's a good idea to penalise your case when we're not all that consistent to start with. Is this causing some sort of breakage? David
On 20/9/2024 04:36, David Rowley wrote: > On Fri, 20 Sept 2024 at 04:47, Andrei Lepikhov <lepihov@gmail.com> wrote: >> Excuse me if I made noise in vain. After discovering the limits of the >> Memoize node, I realized that volatile functions are allowed under the >> Memoize. > I'm not sure if it's a good idea to penalise your case when we're not > all that consistent to start with. Is this causing some sort of > breakage? As I've said before, I just discovered the feature's limits and realised that I don't actually understand the idea of volatility checking: even in the get_memoize_path, we spend cycles checking the target list and base restrictions for such functions. Is it necessary there? -- regards, Andrei Lepikhov
On 20/9/2024 04:36, David Rowley wrote: > On Fri, 20 Sept 2024 at 04:47, Andrei Lepikhov <lepihov@gmail.com> wrote: > I'm not sure if it's a good idea to penalise your case when we're not > all that consistent to start with. Is this causing some sort of > breakage? I skimmed the code entries with such checks and found out that the initial reason was to avoid index scans, with a reason that such a routine should be applied to each tuple of the table. The second reason - to postpone expression evaluation (9118d03) - is also reasonable for me. It was introduced to be consistent with the clause's syntactical level in the SQL. It seems to follow the same idea as disabling subquery pull-ups: to avoid multiple evaluations and change the syntactical level. At the same time, Material doesn't care about volatility. So, what was the idea behind the commit 990c365 you added? -- regards, Andrei Lepikhov
On Fri, 20 Sept 2024 at 20:46, Andrei Lepikhov <lepihov@gmail.com> wrote: > I skimmed the code entries with such checks and found out that the > initial reason was to avoid index scans, with a reason that such a > routine should be applied to each tuple of the table. > The second reason - to postpone expression evaluation (9118d03) - is > also reasonable for me. It was introduced to be consistent with the > clause's syntactical level in the SQL. > It seems to follow the same idea as disabling subquery pull-ups: to > avoid multiple evaluations and change the syntactical level. > At the same time, Material doesn't care about volatility. So, what was > the idea behind the commit 990c365 you added? I don't recall. Likely to try and keep Memoize more in keeping with what would happen prior to Memoize existing. It seems customary around here to disable various optimisations in the planner when there are volatile functions to try to avoid changing the number of evaluations of the volatile function. However, I've yet to see any sort of standard we're meant to be abiding by for it. You only need to look at things like; postgres=# select random(), random(); random | random --------------------+-------------------- 0.6097568694225706 | 0.5371689823343302 (1 row) postgres=# select random(), random() order by random(); random | random ---------------------+--------------------- 0.16673781514021058 | 0.16673781514021058 (1 row) and you might be left scratching your head at why all the random() calls randomly returned the same value. Of course, the above behaviour is on purpose, but there are certainly reports of people questioning it [1]. David [1] https://www.postgresql.org/message-id/CALA8mJrDQhL-kntd%3DypBgwvogL8%3Dkspn5za1Mxv%2BmS%3DdinL5Sg%40mail.gmail.com