Thread: Volatile functions under Memoize node

Volatile functions under Memoize node

From

Andrei Lepikhov

Date:

19 September 2024, 19:47:30

Hi,

Excuse me if I made noise in vain. After discovering the limits of the 
Memoize node, I realized that volatile functions are allowed under the 
Memoize. Example:

DROP TABLE IF EXISTS base, other CASCADE;
CREATE TABLE base (x numeric, y text, x1 integer);
INSERT INTO base (x,y,x1) SELECT 1, 'abccccccccccccc'||1,1 FROM 
generate_series(1,1E6) AS x;
CREATE TABLE other (x numeric, y text, x1 integer);
INSERT INTO other (x,y,x1) SELECT x, 'abccccccccccccc'||x,1 FROM 
generate_series(1,1E4) AS x;
VACUUM ANALYZE base, other;


EXPLAIN (COSTS OFF)
SELECT * FROM base WHERE base.x IN (
   SELECT o2.x FROM other o1 LEFT JOIN other o2 ON (o1.x=o2.x) LEFT JOIN 
other o3 ON (o2.x=o3.x+random())
   WHERE base.x = o1.x GROUP BY o2.x ORDER BY o2.x
);

/*
  Nested Loop
    ->  Seq Scan on base
    ->  Memoize
          Cache Key: base.x
          Cache Mode: binary
          ->  Subquery Scan on "ANY_subquery"
                Filter: (base.x = "ANY_subquery".x)
                ->  Group
                      Group Key: o2.x
                      ->  Sort
                            Sort Key: o2.x
                            ->  Nested Loop Left Join
                                  Join Filter: ((o2.x)::double precision 
= ((o3.x)::double precision + random()))
                                  ->  Nested Loop Left Join
                                        ->  Seq Scan on other o1
                                              Filter: (base.x = x)
                                        ->  Seq Scan on other o2
                                              Filter: (x = base.x)
                                  ->  Seq Scan on other o3
*/

In my mind, any volatile function in any expression should reject the 
usage of Memoize, am I wrong? I haven't discovered this case deeply yet, 
but maybe someone has a quick and short answer to this question.

-- 
regards, Andrei Lepikhov

Re: Volatile functions under Memoize node

From

David Rowley

Date:

20 September 2024, 05:36:47

On Fri, 20 Sept 2024 at 04:47, Andrei Lepikhov <lepihov@gmail.com> wrote:
> Excuse me if I made noise in vain. After discovering the limits of the
> Memoize node, I realized that volatile functions are allowed under the
> Memoize. Example:

I don't think we're particularly consistent about the number of
evaluations of volatile functions in general.

Here's something crafted up without Memoize:

create function volatilefunc(p int) returns int as $$
begin
raise notice '%', p;
return p;
end;
$$ language plpgsql volatile;

explain analyze select * from (values(1),(2)) t1(v) where t1.v
in(select t2.v from (values(1),(2)) t2(v) inner join (values(1),(2))
t3(v) on t2.v = volatilefunc(t3.v));

Normally:
NOTICE:  1
NOTICE:  2
NOTICE:  1
NOTICE:  2

But:
set enable_hashagg = 0;
set enable_hashjoin = 0;
set enable_material = 0;
set enable_sort = 0;

gives:
NOTICE:  1
NOTICE:  1
NOTICE:  2
NOTICE:  1
NOTICE:  2

I'm not sure if it's a good idea to penalise your case when we're not
all that consistent to start with.  Is this causing some sort of
breakage?

David

Re: Volatile functions under Memoize node

From

Andrei Lepikhov

Date:

20 September 2024, 06:25:49

On 20/9/2024 04:36, David Rowley wrote:
> On Fri, 20 Sept 2024 at 04:47, Andrei Lepikhov <lepihov@gmail.com> wrote:
>> Excuse me if I made noise in vain. After discovering the limits of the
>> Memoize node, I realized that volatile functions are allowed under the
>> Memoize.
> I'm not sure if it's a good idea to penalise your case when we're not
> all that consistent to start with.  Is this causing some sort of
> breakage?
As I've said before, I just discovered the feature's limits and realised 
that I don't actually understand the idea of volatility checking: even 
in the get_memoize_path, we spend cycles checking the target list and 
base restrictions for such functions. Is it necessary there?

-- 
regards, Andrei Lepikhov

Re: Volatile functions under Memoize node

From

Andrei Lepikhov

Date:

20 September 2024, 11:46:03

On 20/9/2024 04:36, David Rowley wrote:
> On Fri, 20 Sept 2024 at 04:47, Andrei Lepikhov <lepihov@gmail.com> wrote:
> I'm not sure if it's a good idea to penalise your case when we're not
> all that consistent to start with.  Is this causing some sort of
> breakage?
I skimmed the code entries with such checks and found out that the 
initial reason was to avoid index scans, with a reason that such a 
routine should be applied to each tuple of the table.
The second reason - to postpone expression evaluation (9118d03) - is 
also reasonable for me. It was introduced to be consistent with the 
clause's syntactical level in the SQL.
It seems to follow the same idea as disabling subquery pull-ups: to 
avoid multiple evaluations and change the syntactical level.
At the same time, Material doesn't care about volatility. So, what was 
the idea behind the commit 990c365 you added?

-- 
regards, Andrei Lepikhov

Re: Volatile functions under Memoize node

From

David Rowley

Date:

20 September 2024, 13:42:27

On Fri, 20 Sept 2024 at 20:46, Andrei Lepikhov <lepihov@gmail.com> wrote:
> I skimmed the code entries with such checks and found out that the
> initial reason was to avoid index scans, with a reason that such a
> routine should be applied to each tuple of the table.
> The second reason - to postpone expression evaluation (9118d03) - is
> also reasonable for me. It was introduced to be consistent with the
> clause's syntactical level in the SQL.
> It seems to follow the same idea as disabling subquery pull-ups: to
> avoid multiple evaluations and change the syntactical level.
> At the same time, Material doesn't care about volatility. So, what was
> the idea behind the commit 990c365 you added?

I don't recall. Likely to try and keep Memoize more in keeping with
what would happen prior to Memoize existing.

It seems customary around here to disable various optimisations in the
planner when there are volatile functions to try to avoid changing the
number of evaluations of the volatile function. However, I've yet to
see any sort of standard we're meant to be abiding by for it.

You only need to look at things like;

postgres=# select random(), random();
       random       |       random
--------------------+--------------------
 0.6097568694225706 | 0.5371689823343302
(1 row)

postgres=# select random(), random() order by random();
       random        |       random
---------------------+---------------------
 0.16673781514021058 | 0.16673781514021058
(1 row)

and you might be left scratching your head at why all the random()
calls randomly returned the same value. Of course, the above behaviour
is on purpose, but there are certainly reports of people questioning
it [1].

David

[1] https://www.postgresql.org/message-id/CALA8mJrDQhL-kntd%3DypBgwvogL8%3Dkspn5za1Mxv%2BmS%3DdinL5Sg%40mail.gmail.com