Thread: plan cache overhead on plpgsql expression

plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

16 February 2020, 14:12:25

when I do some profiling of plpgsql, usually I surprised how significant overhead has expression execution. Any calculations are very slow.

This is not typical example of plpgsql, but it shows cleanly where is a overhead

CREATE OR REPLACE FUNCTION public.foo()
RETURNS void
LANGUAGE plpgsql
IMMUTABLE
AS $function$
declare i bigint = 0;
begin
while i < 100000000
loop
i := i + 1;
end loop;
end;
$function$

Profile of development version

10,04% plpgsql.so [.] exec_eval_simple_expr
9,17% postgres [.] AcquireExecutorLocks
7,01% postgres [.] ExecInterpExpr
5,86% postgres [.] OverrideSearchPathMatchesCurrent
4,71% postgres [.] GetCachedPlan
4,14% postgres [.] AcquirePlannerLocks
3,72% postgres [.] RevalidateCachedQuery
3,56% postgres [.] MemoryContextReset
3,43% plpgsql.so [.] plpgsql_param_eval_var
3,33% postgres [.] SPI_plan_get_cached_plan
3,28% plpgsql.so [.] exec_stmt
3,18% postgres [.] ReleaseCachedPlan
2,92% postgres [.] ResourceArrayRemove
2,81% plpgsql.so [.] exec_assign_value
2,74% plpgsql.so [.] exec_cast_value
2,70% plpgsql.so [.] exec_eval_expr
1,96% postgres [.] recomputeNamespacePath
1,90% plpgsql.so [.] exec_eval_boolean
1,82% plpgsql.so [.] exec_eval_cleanup
1,72% postgres [.] ScanQueryForLocks
1,68% postgres [.] CheckCachedPlan
1,49% postgres [.] ResourceArrayAdd
1,48% plpgsql.so [.] exec_assign_expr
1,42% postgres [.] ResourceOwnerForgetPlanCacheRef
1,24% plpgsql.so [.] exec_stmts
1,23% plpgsql.so [.] exec_stmt_while
1,03% plpgsql.so [.] assign_simple_var
0,73% postgres [.] int84lt
0,62% postgres [.] ResourceOwnerEnlargePlanCacheRefs
0,54% postgres [.] int84pl
0,49% plpgsql.so [.] setup_param_list
0,45% postgres [.] ResourceArrayEnlarge
0,44% postgres [.] choose_custom_plan
0,39% postgres [.] ResourceOwnerRememberPlanCacheRef
0,30% plpgsql.so [.] exec_stmt_assign
0,26% postgres [.] GetUserId
0,22% plpgsql.so [.] SPI_plan_get_cached_plan@plt

and profile of PostgreSQL 8.2

13,63% plpgsql.so [.] exec_eval_simple_expr
9,72% postgres [.] AllocSetAlloc
7,84% postgres [.] ExecMakeFunctionResultNoSets
6,20% plpgsql.so [.] exec_assign_value
5,46% postgres [.] AllocSetReset
4,79% postgres [.] ExecEvalParam
4,53% plpgsql.so [.] exec_eval_datum
4,40% postgres [.] MemoryContextAlloc
3,51% plpgsql.so [.] exec_stmt
3,01% plpgsql.so [.] exec_eval_expr
2,76% postgres [.] int84pl
2,11% plpgsql.so [.] exec_eval_cleanup
1,77% postgres [.] datumCopy
1,76% postgres [.] MemoryContextReset
1,75% libc-2.30.so [.] __sigsetjmp
1,64% postgres [.] int84lt
1,47% postgres [.] pfree
1,43% plpgsql.so [.] exec_simple_cast_value
1,36% plpgsql.so [.] MemoryContextReset@plt
1,28% plpgsql.so [.] exec_stmt_while
1,25% plpgsql.so [.] exec_assign_expr
1,22% postgres [.] check_stack_depth
1,09% plpgsql.so [.] exec_eval_boolean
1,06% postgres [.] AllocSetFree
0,99% plpgsql.so [.] free_var
0,93% plpgsql.so [.] exec_cast_value
0,93% plpgsql.so [.] exec_stmts
0,78% libc-2.30.so [.] __memmove_sse2_unaligned_erms
0,72% postgres [.] datumGetSize
0,62% postgres [.] Int64GetDatum
0,51% libc-2.30.so [.] __sigjmp_save
0,49% postgres [.] ExecEvalConst
0,41% plpgsql.so [.] exec_stmt_assign
0,28% postgres [.] SPI_pop
0,26% plpgsql.so [.] MemoryContextAlloc@plt
0,25% postgres [.] SPI_push
0,25% plpgsql.so [.] SPI_push@plt
0,24% plpgsql.so [.] __sigsetjmp@plt
0,23% plpgsql.so [.] SPI_pop@plt
0,19% libc-2.30.so [.] __memset_sse2_unaligned_erms
0,14% libc-2.30.so [.] memcpy@GLIBC_2.2.5
0,13% postgres [.] memcpy@plt

Is interesting so overhead of plan cache about 15%

The execution needs 32 sec on Postgres13 and 27sec on Postgres8.2

Regards

Pavel

Re: plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

16 February 2020, 16:00:35

ne 16. 2. 2020 v 15:12 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

Hi

when I do some profiling of plpgsql, usually I surprised how significant overhead has expression execution. Any calculations are very slow.

This is not typical example of plpgsql, but it shows cleanly where is a overhead

CREATE OR REPLACE FUNCTION public.foo()
RETURNS void
LANGUAGE plpgsql
IMMUTABLE
AS $function$
declare i bigint = 0;
begin
while i < 100000000
loop
i := i + 1;
end loop;
end;
$function$

Is interesting so overhead of plan cache about 15%

The execution needs 32 sec on Postgres13 and 27sec on Postgres8.2

On same computer same example in Perl needs only 7 sec.

Regards

Pavel

Regards

Pavel

Re: plan cache overhead on plpgsql expression

From

Amit Langote

Date:

18 February 2020, 05:03:13

Hi,

On Sun, Feb 16, 2020 at 11:13 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> when I do some profiling of plpgsql, usually I surprised how significant overhead has expression execution. Any
calculationsare very slow.
 
>
> This is not typical example of plpgsql, but it shows cleanly where is a overhead
>
> CREATE OR REPLACE FUNCTION public.foo()
>  RETURNS void
>  LANGUAGE plpgsql
>  IMMUTABLE
> AS $function$
> declare i bigint = 0;
> begin
>   while i < 100000000
>   loop
>     i := i + 1;
>   end loop;
> end;
> $function$
>
> Profile of development  version
>
>   10,04%  plpgsql.so                          [.] exec_eval_simple_expr
>    9,17%  postgres                            [.] AcquireExecutorLocks
>    7,01%  postgres                            [.] ExecInterpExpr
>    5,86%  postgres                            [.] OverrideSearchPathMatchesCurrent
>    4,71%  postgres                            [.] GetCachedPlan
>    4,14%  postgres                            [.] AcquirePlannerLocks
>    3,72%  postgres                            [.] RevalidateCachedQuery
>    3,56%  postgres                            [.] MemoryContextReset
>    3,43%  plpgsql.so                          [.] plpgsql_param_eval_var

I was thinking about this overhead many months back and had even
written a patch to avoid going to the planner for "simple"
expressions, which can be handled by the executor.  Here is what the
performance looks like:

HEAD:

latency: 31979.393 ms

    18.32%  postgres  postgres           [.] ExecInterpExpr
    11.37%  postgres  plpgsql.so         [.] exec_eval_expr
     8.58%  postgres  plpgsql.so         [.] plpgsql_param_eval_var
     8.31%  postgres  plpgsql.so         [.] exec_stmt
     6.44%  postgres  postgres           [.] GetCachedPlan
     5.47%  postgres  postgres           [.] AcquireExecutorLocks
     5.30%  postgres  postgres           [.] RevalidateCachedQuery
     4.79%  postgres  plpgsql.so         [.] exec_assign_value
     4.41%  postgres  postgres           [.] SPI_plan_get_cached_plan
     4.36%  postgres  postgres           [.] MemoryContextReset
     4.22%  postgres  postgres           [.] ReleaseCachedPlan
     4.03%  postgres  postgres           [.] OverrideSearchPathMatchesCurrent
     2.63%  postgres  plpgsql.so         [.] exec_assign_expr
     2.11%  postgres  postgres           [.] int84lt
     1.95%  postgres  postgres           [.] ResourceOwnerForgetPlanCacheRef
     1.71%  postgres  postgres           [.] int84pl
     1.57%  postgres  postgres           [.] ResourceOwnerRememberPlanCacheRef
     1.38%  postgres  postgres           [.] recomputeNamespacePath
     1.35%  postgres  postgres           [.] ScanQueryForLocks
     1.24%  postgres  plpgsql.so         [.] exec_cast_value
     0.38%  postgres  postgres           [.] ResourceOwnerEnlargePlanCacheRefs
     0.05%  postgres  [kernel.kallsyms]  [k] __do_softirq
     0.03%  postgres  postgres           [.] GetUserId

Patched:

latency: 21011.871 ms

    28.26%  postgres  postgres           [.] ExecInterpExpr
    12.26%  postgres  plpgsql.so         [.] plpgsql_param_eval_var
    12.02%  postgres  plpgsql.so         [.] exec_stmt
    11.10%  postgres  plpgsql.so         [.] exec_eval_expr
    10.05%  postgres  postgres           [.] SPI_plan_is_valid
     7.09%  postgres  postgres           [.] MemoryContextReset
     6.65%  postgres  plpgsql.so         [.] exec_assign_value
     3.53%  postgres  plpgsql.so         [.] exec_assign_expr
     2.91%  postgres  postgres           [.] int84lt
     2.61%  postgres  postgres           [.] int84pl
     2.42%  postgres  plpgsql.so         [.] exec_cast_value
     0.86%  postgres  postgres           [.] CachedPlanIsValid
     0.16%  postgres  plpgsql.so         [.] SPI_plan_is_valid@plt
     0.05%  postgres  [kernel.kallsyms]  [k] __do_softirq
     0.03%  postgres  [kernel.kallsyms]  [k] finish_task_switch

I didn't send the patch, because it didn't handle the cases where a
simple expression consists of an inline-able function(s) in it, which
are better handled by a full-fledged planner call backed up by the
plan cache.  If we don't do that then every evaluation of such
"simple" expression needs to invoke the planner.  For example:

Consider this inline-able SQL function:

create or replace function sql_incr(a bigint)
returns int
immutable language sql as $$
select a+1;
$$;

Then this revised body of your function foo():

CREATE OR REPLACE FUNCTION public.foo()
 RETURNS int
 LANGUAGE plpgsql
 IMMUTABLE
AS $function$
declare i bigint = 0;
begin
  while i < 1000000
  loop
    i := sql_incr(i);
  end loop; return i;
end;
$function$
;

With HEAD `select foo()` finishes in 786 ms, whereas with the patch,
it takes 5102 ms.

I think the patch might be good idea to reduce the time to compute
simple expressions in plpgsql, if we can address the above issue.

Thanks,
Amit

Attachment

plpgsql-simple-exprs.patch

Re: plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

18 February 2020, 05:55:31

út 18. 2. 2020 v 6:03 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:

Hi,

On Sun, Feb 16, 2020 at 11:13 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> when I do some profiling of plpgsql, usually I surprised how significant overhead has expression execution. Any calculations are very slow.
>
> This is not typical example of plpgsql, but it shows cleanly where is a overhead
>
> CREATE OR REPLACE FUNCTION public.foo()
> RETURNS void
> LANGUAGE plpgsql
> IMMUTABLE
> AS $function$
> declare i bigint = 0;
> begin
> while i < 100000000
> loop
> i := i + 1;
> end loop;
> end;
> $function$
>
> Profile of development version
>
> 10,04% plpgsql.so [.] exec_eval_simple_expr
> 9,17% postgres [.] AcquireExecutorLocks
> 7,01% postgres [.] ExecInterpExpr
> 5,86% postgres [.] OverrideSearchPathMatchesCurrent
> 4,71% postgres [.] GetCachedPlan
> 4,14% postgres [.] AcquirePlannerLocks
> 3,72% postgres [.] RevalidateCachedQuery
> 3,56% postgres [.] MemoryContextReset
> 3,43% plpgsql.so [.] plpgsql_param_eval_var

I was thinking about this overhead many months back and had even
written a patch to avoid going to the planner for "simple"
expressions, which can be handled by the executor. Here is what the
performance looks like:

HEAD:

latency: 31979.393 ms

18.32% postgres postgres [.] ExecInterpExpr
11.37% postgres plpgsql.so [.] exec_eval_expr
8.58% postgres plpgsql.so [.] plpgsql_param_eval_var
8.31% postgres plpgsql.so [.] exec_stmt
6.44% postgres postgres [.] GetCachedPlan
5.47% postgres postgres [.] AcquireExecutorLocks
5.30% postgres postgres [.] RevalidateCachedQuery
4.79% postgres plpgsql.so [.] exec_assign_value
4.41% postgres postgres [.] SPI_plan_get_cached_plan
4.36% postgres postgres [.] MemoryContextReset
4.22% postgres postgres [.] ReleaseCachedPlan
4.03% postgres postgres [.] OverrideSearchPathMatchesCurrent
2.63% postgres plpgsql.so [.] exec_assign_expr
2.11% postgres postgres [.] int84lt
1.95% postgres postgres [.] ResourceOwnerForgetPlanCacheRef
1.71% postgres postgres [.] int84pl
1.57% postgres postgres [.] ResourceOwnerRememberPlanCacheRef
1.38% postgres postgres [.] recomputeNamespacePath
1.35% postgres postgres [.] ScanQueryForLocks
1.24% postgres plpgsql.so [.] exec_cast_value
0.38% postgres postgres [.] ResourceOwnerEnlargePlanCacheRefs
0.05% postgres [kernel.kallsyms] [k] __do_softirq
0.03% postgres postgres [.] GetUserId

Patched:

latency: 21011.871 ms

28.26% postgres postgres [.] ExecInterpExpr
12.26% postgres plpgsql.so [.] plpgsql_param_eval_var
12.02% postgres plpgsql.so [.] exec_stmt
11.10% postgres plpgsql.so [.] exec_eval_expr
10.05% postgres postgres [.] SPI_plan_is_valid
7.09% postgres postgres [.] MemoryContextReset
6.65% postgres plpgsql.so [.] exec_assign_value
3.53% postgres plpgsql.so [.] exec_assign_expr
2.91% postgres postgres [.] int84lt
2.61% postgres postgres [.] int84pl
2.42% postgres plpgsql.so [.] exec_cast_value
0.86% postgres postgres [.] CachedPlanIsValid
0.16% postgres plpgsql.so [.] SPI_plan_is_valid@plt
0.05% postgres [kernel.kallsyms] [k] __do_softirq
0.03% postgres [kernel.kallsyms] [k] finish_task_switch

I didn't send the patch, because it didn't handle the cases where a
simple expression consists of an inline-able function(s) in it, which
are better handled by a full-fledged planner call backed up by the
plan cache. If we don't do that then every evaluation of such
"simple" expression needs to invoke the planner. For example:

Consider this inline-able SQL function:

create or replace function sql_incr(a bigint)
returns int
immutable language sql as $$
select a+1;
$$;

Then this revised body of your function foo():

CREATE OR REPLACE FUNCTION public.foo()
RETURNS int
LANGUAGE plpgsql
IMMUTABLE
AS $function$
declare i bigint = 0;
begin
while i < 1000000
loop
i := sql_incr(i);
end loop; return i;
end;
$function$
;

With HEAD `select foo()` finishes in 786 ms, whereas with the patch,
it takes 5102 ms.

I think the patch might be good idea to reduce the time to compute
simple expressions in plpgsql, if we can address the above issue.

Your patch is very interesting - minimally it returns performance before 8.2. The mentioned issue can be fixed if we disallow SQL functions in this fast execution.

I am worried about too low percent if this fundament methods.

2.91% postgres postgres [.] int84lt
2.61% postgres postgres [.] int84pl

Perl

18,20% libperl.so.5.30.1 [.] Perl_pp_add
17,61% libperl.so.5.30.1 [.] Perl_pp_lt

So can be nice if we increase percent overhead over 10%, maybe more.

Maybe we can check if expression has only builtin immutable functions, and if it, then we can reuse expression state

More, if I understand well, the function is running under snapshot, so there is not possibility to plan invalidation inside function. So some checks should not be repeated.

Pavel

Thanks,
Amit

Re: plan cache overhead on plpgsql expression

From

Amit Langote

Date:

18 February 2020, 09:56:23

On Tue, Feb 18, 2020 at 2:56 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> út 18. 2. 2020 v 6:03 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
>> I didn't send the patch, because it didn't handle the cases where a
>> simple expression consists of an inline-able function(s) in it, which
>> are better handled by a full-fledged planner call backed up by the
>> plan cache.  If we don't do that then every evaluation of such
>> "simple" expression needs to invoke the planner.  For example:
>>
>> Consider this inline-able SQL function:
>>
>> create or replace function sql_incr(a bigint)
>> returns int
>> immutable language sql as $$
>> select a+1;
>> $$;
>>
>> Then this revised body of your function foo():
>>
>> CREATE OR REPLACE FUNCTION public.foo()
>>  RETURNS int
>>  LANGUAGE plpgsql
>>  IMMUTABLE
>> AS $function$
>> declare i bigint = 0;
>> begin
>>   while i < 1000000
>>   loop
>>     i := sql_incr(i);
>>   end loop; return i;
>> end;
>> $function$
>> ;
>>
>> With HEAD `select foo()` finishes in 786 ms, whereas with the patch,
>> it takes 5102 ms.
>>
>> I think the patch might be good idea to reduce the time to compute
>> simple expressions in plpgsql, if we can address the above issue.
>
>
> Your patch is very interesting - minimally it returns performance before 8.2. The mentioned issue can be fixed if we
disallowSQL functions in this fast execution. 

I updated the patch to do that.

With the new patch, `select foo()`, with inline-able sql_incr() in it,
runs in 679 ms.

Without any inline-able function, it runs in 330 ms, whereas with
HEAD, it takes 590 ms.

Thanks,
Amit

Attachment

plpgsql-simple-exprs_v2.patch

Re: plan cache overhead on plpgsql expression

From

Amit Langote

Date:

18 February 2020, 16:08:45

On Tue, Feb 18, 2020 at 6:56 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Tue, Feb 18, 2020 at 2:56 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > út 18. 2. 2020 v 6:03 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
> >> I didn't send the patch, because it didn't handle the cases where a
> >> simple expression consists of an inline-able function(s) in it, which
> >> are better handled by a full-fledged planner call backed up by the
> >> plan cache.  If we don't do that then every evaluation of such
> >> "simple" expression needs to invoke the planner.  For example:
> >>
> >> Consider this inline-able SQL function:
> >>
> >> create or replace function sql_incr(a bigint)
> >> returns int
> >> immutable language sql as $$
> >> select a+1;
> >> $$;
> >>
> >> Then this revised body of your function foo():
> >>
> >> CREATE OR REPLACE FUNCTION public.foo()
> >>  RETURNS int
> >>  LANGUAGE plpgsql
> >>  IMMUTABLE
> >> AS $function$
> >> declare i bigint = 0;
> >> begin
> >>   while i < 1000000
> >>   loop
> >>     i := sql_incr(i);
> >>   end loop; return i;
> >> end;
> >> $function$
> >> ;
> >>
> >> With HEAD `select foo()` finishes in 786 ms, whereas with the patch,
> >> it takes 5102 ms.
> >>
> >> I think the patch might be good idea to reduce the time to compute
> >> simple expressions in plpgsql, if we can address the above issue.
> >
> >
> > Your patch is very interesting - minimally it returns performance before 8.2. The mentioned issue can be fixed if
wedisallow SQL functions in this fast execution. 
>
> I updated the patch to do that.
>
> With the new patch, `select foo()`, with inline-able sql_incr() in it,
> runs in 679 ms.
>
> Without any inline-able function, it runs in 330 ms, whereas with
> HEAD, it takes 590 ms.

I polished it a bit.

Thanks,
Amit

Attachment

plpgsql-simple-exprs_v3.patch

Re: plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

19 February 2020, 06:30:13

út 18. 2. 2020 v 17:08 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:

On Tue, Feb 18, 2020 at 6:56 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Tue, Feb 18, 2020 at 2:56 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > út 18. 2. 2020 v 6:03 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
> >> I didn't send the patch, because it didn't handle the cases where a
> >> simple expression consists of an inline-able function(s) in it, which
> >> are better handled by a full-fledged planner call backed up by the
> >> plan cache. If we don't do that then every evaluation of such
> >> "simple" expression needs to invoke the planner. For example:
> >>
> >> Consider this inline-able SQL function:
> >>
> >> create or replace function sql_incr(a bigint)
> >> returns int
> >> immutable language sql as $$
> >> select a+1;
> >> $$;
> >>
> >> Then this revised body of your function foo():
> >>
> >> CREATE OR REPLACE FUNCTION public.foo()
> >> RETURNS int
> >> LANGUAGE plpgsql
> >> IMMUTABLE
> >> AS $function$
> >> declare i bigint = 0;
> >> begin
> >> while i < 1000000
> >> loop
> >> i := sql_incr(i);
> >> end loop; return i;
> >> end;
> >> $function$
> >> ;
> >>
> >> With HEAD `select foo()` finishes in 786 ms, whereas with the patch,
> >> it takes 5102 ms.
> >>
> >> I think the patch might be good idea to reduce the time to compute
> >> simple expressions in plpgsql, if we can address the above issue.
> >
> >
> > Your patch is very interesting - minimally it returns performance before 8.2. The mentioned issue can be fixed if we disallow SQL functions in this fast execution.
>
> I updated the patch to do that.
>
> With the new patch, `select foo()`, with inline-able sql_incr() in it,
> runs in 679 ms.
>
> Without any inline-able function, it runs in 330 ms, whereas with
> HEAD, it takes 590 ms.

I polished it a bit.

the performance looks very interesting - on my comp the execution time of 100000000 iterations was decreased from 34 sec to 15 sec,

So it is interesting speedup

Pavel

Thanks,
Amit

Re: plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

19 February 2020, 06:37:26

st 19. 2. 2020 v 7:30 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

út 18. 2. 2020 v 17:08 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
On Tue, Feb 18, 2020 at 6:56 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Tue, Feb 18, 2020 at 2:56 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > út 18. 2. 2020 v 6:03 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
> >> I didn't send the patch, because it didn't handle the cases where a
> >> simple expression consists of an inline-able function(s) in it, which
> >> are better handled by a full-fledged planner call backed up by the
> >> plan cache. If we don't do that then every evaluation of such
> >> "simple" expression needs to invoke the planner. For example:
> >>
> >> Consider this inline-able SQL function:
> >>
> >> create or replace function sql_incr(a bigint)
> >> returns int
> >> immutable language sql as $$
> >> select a+1;
> >> $$;
> >>
> >> Then this revised body of your function foo():
> >>
> >> CREATE OR REPLACE FUNCTION public.foo()
> >> RETURNS int
> >> LANGUAGE plpgsql
> >> IMMUTABLE
> >> AS $function$
> >> declare i bigint = 0;
> >> begin
> >> while i < 1000000
> >> loop
> >> i := sql_incr(i);
> >> end loop; return i;
> >> end;
> >> $function$
> >> ;
> >>
> >> With HEAD `select foo()` finishes in 786 ms, whereas with the patch,
> >> it takes 5102 ms.
> >>
> >> I think the patch might be good idea to reduce the time to compute
> >> simple expressions in plpgsql, if we can address the above issue.
> >
> >
> > Your patch is very interesting - minimally it returns performance before 8.2. The mentioned issue can be fixed if we disallow SQL functions in this fast execution.
>
> I updated the patch to do that.
>
> With the new patch, `select foo()`, with inline-able sql_incr() in it,
> runs in 679 ms.
>
> Without any inline-able function, it runs in 330 ms, whereas with
> HEAD, it takes 590 ms.

I polished it a bit.

the performance looks very interesting - on my comp the execution time of 100000000 iterations was decreased from 34 sec to 15 sec,

So it is interesting speedup

but regress tests fails

Pavel

Thanks,
Amit

Attachment

Re: plan cache overhead on plpgsql expression

From

Amit Langote

Date:

19 February 2020, 06:56:45

On Wed, Feb 19, 2020 at 3:38 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> st 19. 2. 2020 v 7:30 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
>> út 18. 2. 2020 v 17:08 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
>>> > I updated the patch to do that.
>>> >
>>> > With the new patch, `select foo()`, with inline-able sql_incr() in it,
>>> > runs in 679 ms.
>>> >
>>> > Without any inline-able function, it runs in 330 ms, whereas with
>>> > HEAD, it takes 590 ms.
>>>
>>> I polished it a bit.
>>
>>
>> the performance looks very interesting - on my comp the execution time of  100000000 iterations was decreased from
34sec to 15 sec, 
>>
>> So it is interesting speedup
>
> but regress tests fails

Oops, I failed to check src/pl/plpgsql tests.

Fixed in the attached.

Thanks,
Amit

Attachment

plpgsql-simple-exprs_v4.patch

Re: plan cache overhead on plpgsql expression

From

Amit Langote

Date:

19 February 2020, 07:08:59

On Wed, Feb 19, 2020 at 3:56 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Feb 19, 2020 at 3:38 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > st 19. 2. 2020 v 7:30 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
> >> út 18. 2. 2020 v 17:08 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
> >>> > I updated the patch to do that.
> >>> >
> >>> > With the new patch, `select foo()`, with inline-able sql_incr() in it,
> >>> > runs in 679 ms.
> >>> >
> >>> > Without any inline-able function, it runs in 330 ms, whereas with
> >>> > HEAD, it takes 590 ms.
> >>>
> >>> I polished it a bit.
> >>
> >>
> >> the performance looks very interesting - on my comp the execution time of  100000000 iterations was decreased from
34sec to 15 sec, 
> >>
> >> So it is interesting speedup
> >
> > but regress tests fails
>
> Oops, I failed to check src/pl/plpgsql tests.
>
> Fixed in the attached.

Added a regression test based on examples discussed here too.

Thanks,
Amit

Attachment

plpgsql-simple-exprs_v5.patch

Re: plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

20 February 2020, 19:15:48

st 19. 2. 2020 v 8:09 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:

On Wed, Feb 19, 2020 at 3:56 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Feb 19, 2020 at 3:38 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > st 19. 2. 2020 v 7:30 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
> >> út 18. 2. 2020 v 17:08 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
> >>> > I updated the patch to do that.
> >>> >
> >>> > With the new patch, `select foo()`, with inline-able sql_incr() in it,
> >>> > runs in 679 ms.
> >>> >
> >>> > Without any inline-able function, it runs in 330 ms, whereas with
> >>> > HEAD, it takes 590 ms.
> >>>
> >>> I polished it a bit.
> >>
> >>
> >> the performance looks very interesting - on my comp the execution time of 100000000 iterations was decreased from 34 sec to 15 sec,
> >>
> >> So it is interesting speedup
> >
> > but regress tests fails
>
> Oops, I failed to check src/pl/plpgsql tests.
>
> Fixed in the attached.

Added a regression test based on examples discussed here too.

It is working without problems

I think this patch is very interesting for Postgres 13

Regards

Pavel

Thanks,
Amit

Re: plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

24 February 2020, 17:47:17

čt 20. 2. 2020 v 20:15 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

st 19. 2. 2020 v 8:09 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
On Wed, Feb 19, 2020 at 3:56 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Feb 19, 2020 at 3:38 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > st 19. 2. 2020 v 7:30 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
> >> út 18. 2. 2020 v 17:08 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
> >>> > I updated the patch to do that.
> >>> >
> >>> > With the new patch, `select foo()`, with inline-able sql_incr() in it,
> >>> > runs in 679 ms.
> >>> >
> >>> > Without any inline-able function, it runs in 330 ms, whereas with
> >>> > HEAD, it takes 590 ms.
> >>>
> >>> I polished it a bit.
> >>
> >>
> >> the performance looks very interesting - on my comp the execution time of 100000000 iterations was decreased from 34 sec to 15 sec,
> >>
> >> So it is interesting speedup
> >
> > but regress tests fails
>
> Oops, I failed to check src/pl/plpgsql tests.
>
> Fixed in the attached.

Added a regression test based on examples discussed here too.

It is working without problems

I think this patch is very interesting for Postgres 13

I checked a performance of this patch again and I think so there is not too much space for another optimization - maybe JIT can help.

There is relative high overhead of call of strict functions - the params are repeatedly tested against NULL.

Regards

Pavel

Regards

Pavel

Thanks,
Amit

Re: plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

24 February 2020, 17:56:55

po 24. 2. 2020 v 18:47 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

čt 20. 2. 2020 v 20:15 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

st 19. 2. 2020 v 8:09 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
On Wed, Feb 19, 2020 at 3:56 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Feb 19, 2020 at 3:38 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > st 19. 2. 2020 v 7:30 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
> >> út 18. 2. 2020 v 17:08 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
> >>> > I updated the patch to do that.
> >>> >
> >>> > With the new patch, `select foo()`, with inline-able sql_incr() in it,
> >>> > runs in 679 ms.
> >>> >
> >>> > Without any inline-able function, it runs in 330 ms, whereas with
> >>> > HEAD, it takes 590 ms.
> >>>
> >>> I polished it a bit.
> >>
> >>
> >> the performance looks very interesting - on my comp the execution time of 100000000 iterations was decreased from 34 sec to 15 sec,
> >>
> >> So it is interesting speedup
> >
> > but regress tests fails
>
> Oops, I failed to check src/pl/plpgsql tests.
>
> Fixed in the attached.

Added a regression test based on examples discussed here too.

It is working without problems

I think this patch is very interesting for Postgres 13

I checked a performance of this patch again and I think so there is not too much space for another optimization - maybe JIT can help.

There is relative high overhead of call of strict functions - the params are repeatedly tested against NULL.

But I found one issue - I don't know if this issue is related to your patch or plpgsql_check.

plpgsql_check try to clean after it was executed - it cleans all plans. But some pointers on simple expressions are broken after catched exceptions.

expr->plan = 0x80. Is interesting, so other fields of this expressions are correct.

Regards

Pavel

Regards

Pavel

Thanks,
Amit

Re: plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

24 February 2020, 19:27:51

po 24. 2. 2020 v 18:56 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

po 24. 2. 2020 v 18:47 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

čt 20. 2. 2020 v 20:15 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

st 19. 2. 2020 v 8:09 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
On Wed, Feb 19, 2020 at 3:56 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Feb 19, 2020 at 3:38 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > st 19. 2. 2020 v 7:30 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
> >> út 18. 2. 2020 v 17:08 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
> >>> > I updated the patch to do that.
> >>> >
> >>> > With the new patch, `select foo()`, with inline-able sql_incr() in it,
> >>> > runs in 679 ms.
> >>> >
> >>> > Without any inline-able function, it runs in 330 ms, whereas with
> >>> > HEAD, it takes 590 ms.
> >>>
> >>> I polished it a bit.
> >>
> >>
> >> the performance looks very interesting - on my comp the execution time of 100000000 iterations was decreased from 34 sec to 15 sec,
> >>
> >> So it is interesting speedup
> >
> > but regress tests fails
>
> Oops, I failed to check src/pl/plpgsql tests.
>
> Fixed in the attached.

Added a regression test based on examples discussed here too.

It is working without problems

I think this patch is very interesting for Postgres 13

I checked a performance of this patch again and I think so there is not too much space for another optimization - maybe JIT can help.

There is relative high overhead of call of strict functions - the params are repeatedly tested against NULL.

But I found one issue - I don't know if this issue is related to your patch or plpgsql_check.

plpgsql_check try to clean after it was executed - it cleans all plans. But some pointers on simple expressions are broken after catched exceptions.

expr->plan = 0x80. Is interesting, so other fields of this expressions are correct.

I am not sure, but after patching the SPI_prepare_params the current memory context is some short memory context.

Can SPI_prepare_params change current memory context? It did before. But after patching different memory context is active.

Regards

Pavel

Regards

Pavel

Regards

Pavel

Thanks,
Amit

Re: plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

25 February 2020, 07:16:09

I added this patch to a commitfest

https://commitfest.postgresql.org/27/2467/

It is very interesting speedup and it is in good direction to JIT expressions

Pavel

Re: plan cache overhead on plpgsql expression

From

Amit Langote

Date:

25 February 2020, 08:42:22

Hi Pavel,

On Tue, Feb 25, 2020 at 4:16 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Hi
>
> I added this patch to a commitfest
>
> https://commitfest.postgresql.org/27/2467/
>
> It is very interesting speedup and it is in good direction to JIT expressions

Thank you.  I was planning to do that myself.

I will take a look at your other comments in a day or two.

Thanks,
Amit

Re: plan cache overhead on plpgsql expression

From

David Steele

Date:

17 March 2020, 11:53:08

Hi Amit,

On 2/25/20 3:42 AM, Amit Langote wrote:
> On Tue, Feb 25, 2020 at 4:16 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> I added this patch to a commitfest
>>
>> https://commitfest.postgresql.org/27/2467/
>>
>> It is very interesting speedup and it is in good direction to JIT expressions
> 
> Thank you.  I was planning to do that myself.
> 
> I will take a look at your other comments in a day or two.

Do you know when you'll have chance to look at Pavel's comments?

Regards,
-- 
-David
david@pgmasters.net

Re: plan cache overhead on plpgsql expression

From

Amit Langote

Date:

17 March 2020, 13:32:11

Hi David,

On Tue, Mar 17, 2020 at 8:53 PM David Steele <david@pgmasters.net> wrote:
>
> Hi Amit,
>
> On 2/25/20 3:42 AM, Amit Langote wrote:
> > On Tue, Feb 25, 2020 at 4:16 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> >> I added this patch to a commitfest
> >>
> >> https://commitfest.postgresql.org/27/2467/
> >>
> >> It is very interesting speedup and it is in good direction to JIT expressions
> >
> > Thank you.  I was planning to do that myself.
> >
> > I will take a look at your other comments in a day or two.
>
> Do you know when you'll have chance to look at Pavel's comments?

Sorry, I had forgotten about this. I will try to post an update by Thursday.

-- 
Thank you,
Amit

Re: plan cache overhead on plpgsql expression

From

Amit Langote

Date:

19 March 2020, 09:47:17

Hi Pavel,

Sorry it took me a while to look at this.

On Tue, Feb 25, 2020 at 4:28 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> po 24. 2. 2020 v 18:56 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
>> But I found one issue - I don't know if this issue is related to your patch or plpgsql_check.
>>
>> plpgsql_check try to clean after it was executed - it cleans all plans. But some pointers on simple expressions are
brokenafter catched exceptions. 
>>
>> expr->plan = 0x80. Is interesting, so other fields of this expressions are correct.
>
> I am not sure, but after patching the SPI_prepare_params the current memory context is some short memory context.
>
> Can SPI_prepare_params change current memory context? It did before. But after patching different memory context is
active.

I haven't been able to see the behavior you reported.  Could you let
me know what unexpected memory context you see in the problematic
case?

--
Thank you,
Amit

Re: plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

19 March 2020, 11:19:10

čt 19. 3. 2020 v 10:47 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:

Hi Pavel,

Sorry it took me a while to look at this.

On Tue, Feb 25, 2020 at 4:28 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> po 24. 2. 2020 v 18:56 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
>> But I found one issue - I don't know if this issue is related to your patch or plpgsql_check.
>>
>> plpgsql_check try to clean after it was executed - it cleans all plans. But some pointers on simple expressions are broken after catched exceptions.
>>
>> expr->plan = 0x80. Is interesting, so other fields of this expressions are correct.
>
> I am not sure, but after patching the SPI_prepare_params the current memory context is some short memory context.
>
> Can SPI_prepare_params change current memory context? It did before. But after patching different memory context is active.

I haven't been able to see the behavior you reported. Could you let
me know what unexpected memory context you see in the problematic

case?

How I can detect it? Are there some steps for debugging memory context?

Pavel

--
Thank you,
Amit

Re: plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

20 March 2020, 09:46:50

čt 19. 3. 2020 v 10:47 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:

Hi Pavel,

Sorry it took me a while to look at this.

On Tue, Feb 25, 2020 at 4:28 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> po 24. 2. 2020 v 18:56 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
>> But I found one issue - I don't know if this issue is related to your patch or plpgsql_check.
>>
>> plpgsql_check try to clean after it was executed - it cleans all plans. But some pointers on simple expressions are broken after catched exceptions.
>>
>> expr->plan = 0x80. Is interesting, so other fields of this expressions are correct.
>
> I am not sure, but after patching the SPI_prepare_params the current memory context is some short memory context.
>
> Can SPI_prepare_params change current memory context? It did before. But after patching different memory context is active.

I haven't been able to see the behavior you reported. Could you let
me know what unexpected memory context you see in the problematic
case?

There was a problem with plpgsql_check after I applied this patch. It crashed differently on own regress tests.

But I cannot to reproduce this issue now. Probably there was more issues than one on my build environment.

So my questions and notes about a change of MemoryContext after patching are messy. Sorry for noise.

Regards

Pavel

--
Thank you,
Amit

Re: plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

21 March 2020, 05:08:49

I did another test

I use a pi estimation algorithm and it is little bit more realistic than just almost empty cycle body - still probably nobody will calculate pi in plpgsql.

CREATE OR REPLACE FUNCTION pi_est(n int)
RETURNS numeric AS $$
DECLARE
accum double precision DEFAULT 1.0;
c1 double precision DEFAULT 2.0;
c2 double precision DEFAULT 1.0;
v constant double precision DEFAULT 2.0;
BEGIN
FOR i IN 1..n
LOOP
accum := accum * ((c1 * c1) / (c2 * (c2 + v)));
c1 := c1 + v;
c2 := c2 + v;
END LOOP;
RETURN accum * v;
END;
$$ LANGUAGE plpgsql;

For this code the patch increased speed for 10000000 iterations from 6.3 sec to 4.7 .. it is speedup about 25%

The best performance (28%) is with code

CREATE OR REPLACE FUNCTION pi_est_2(n int)
RETURNS numeric AS $$
DECLARE
accum double precision DEFAULT 1.0;
c1 double precision DEFAULT 2.0;
c2 double precision DEFAULT 1.0;
BEGIN
FOR i IN 1..n
LOOP
accum := accum * ((c1 * c1) / (c2 * (c2 + double precision '2.0')));
c1 := c1 + double precision '2.0';
c2 := c2 + double precision '2.0';
END LOOP;
RETURN accum * double precision '2.0';
END;
$$ LANGUAGE plpgsql;

Unfortunately for unoptimized code the performance is worse (it is about 55% slower)

CREATE OR REPLACE FUNCTION pi_est_1(n int)
RETURNS numeric AS $$
DECLARE
accum double precision DEFAULT 1.0;
c1 double precision DEFAULT 2.0;
c2 double precision DEFAULT 1.0;
BEGIN
FOR i IN 1..n
LOOP
accum := accum * ((c1 * c1) / (c2 * (c2 + 2.0)));
c1 := c1 + 2.0;
c2 := c2 + 2.0;
END LOOP;
RETURN accum * 2.0;
END;
$$ LANGUAGE plpgsql;

same performance (bad) is for explicit casting

CREATE OR REPLACE FUNCTION pi_est_3(n int)
RETURNS numeric AS $$
DECLARE
accum double precision DEFAULT 1.0;
c1 double precision DEFAULT 2.0;
c2 double precision DEFAULT 1.0;
BEGIN
FOR i IN 1..n
LOOP
accum := accum * ((c1 * c1) / (c2 * (c2 + 2.0::double precision)));
c1 := c1 + 2.0::double precision;
c2 := c2 + 2.0::double precision;
END LOOP;
RETURN accum * double precision '2.0';
END;
$$ LANGUAGE plpgsql;

There is relative high overhead of cast from numeric init_var_from_num.

On master (without patching) the speed all double precision variants is almost same.

This example can be reduced

CREATE OR REPLACE FUNCTION public.fx(integer)
RETURNS double precision
LANGUAGE plpgsql
AS $function$
DECLARE
result double precision DEFAULT 1.0;
BEGIN
FOR i IN 1..$1
LOOP
result := result * 1.000001::double precision;
END LOOP;
RETURN result;
END;
$function$

CREATE OR REPLACE FUNCTION public.fx_1(integer)
RETURNS double precision
LANGUAGE plpgsql
AS $function$
DECLARE
result double precision DEFAULT 1.0;
BEGIN
FOR i IN 1..$1
LOOP
result := result * 1.000001;
END LOOP;
RETURN result;
END;
$function$

CREATE OR REPLACE FUNCTION public.fx_2(integer)
RETURNS double precision
LANGUAGE plpgsql
AS $function$
DECLARE
result double precision DEFAULT 1.0;
BEGIN
FOR i IN 1..$1
LOOP
result := result * double precision '1.000001';
END LOOP;
RETURN result;
END;
$function$

Patched select fx(1000000) .. 400ms, fx_1 .. 400ms, fx_2 .. 126ms

Master fx(1000000) .. 180ms, fx_1 180 ms, fx_2 .. 180ms

So the patch has a problem with constant casting - unfortunately the mix of double precision variables and numeric constants is pretty often in Postgres.

Regards

Pavel

Attachment

test.sql

Re: plan cache overhead on plpgsql expression

From

Tom Lane

Date:

21 March 2020, 18:24:05

Pavel Stehule <pavel.stehule@gmail.com> writes:
> So the patch has a problem with constant casting - unfortunately the mix of
> double precision variables and numeric constants is pretty often in
> Postgres.

Yeah.  I believe the cause of that is that the patch thinks it can skip
passing an inline-function-free simple expression through the planner.
That's flat out wrong.  Quite aside from failing to perform
constant-folding (which is presumably the cause of the slowdown you
spotted), that means that we miss performing such non-optional
transformations as rearranging named-function-argument notation into
positional order.  I didn't bother to test that but I'm sure it can be
shown to lead to crashes.

Now that I've looked at the patch I don't like it one bit from a
structural standpoint either.  It's basically trying to make an end
run around the plancache, which is not going to be maintainable even
if it correctly accounted for everything the plancache does today.
Which it doesn't.  Two big problems are:

* It doesn't account for the possibility of search_path changes
affecting the interpretation of an expression.

* It assumes that the *only* things that a simple plan could get
invalidated for are functions that were inlined.  This isn't the
case --- a counterexample is that removal of no-op CoerceToDomain
nodes requires the plan to be invalidated if the domain's constraints
change.  And there are likely to be more such issues in future.


So while there's clearly something worth pursuing here, I do not like
anything about the way it was done.  I think that the right way to
think about this problem is "how can the plan cache provide a fast
path for checking validity of simple-expression plans?".  And when you
think about it that way, there's a pretty obvious answer: if the plan
involves no table references, there's not going to be any locks that
have to be taken before we can check the is_valid flag.  So we can
have a fast path that skips AcquirePlannerLocks and
AcquireExecutorLocks, which are a big part of the problem, and we can
also bypass some of the other random checks that GetCachedPlan has to
make, like whether RLS affects the plan.

Another chunk of the issue is the constant acquisition and release of
reference counts on the plan.  We can't really skip that (I suspect
there are additional bugs in Amit's patch arising from trying to do so).
However, plpgsql already has mechanisms for paying simple-expression
setup costs once per transaction rather than once per expression use.
So we can set up a simple-expression ResourceOwner managed much like
the simple-expression EState, and have it hold a refcount on the
CachedPlan for each simple expression, and pay that overhead just once
per transaction.

So I worked on those ideas for awhile, and came up with the attached
patchset:

0001 adds some regression tests in this area (Amit's patch fails the
tests concerning search_path changes).

0002 does what's suggested above.  I also did a little bit of marginal
micro-tuning in exec_eval_simple_expr() itself.

0003 improves the biggest remaining cost of validity rechecking,
which is verifying that the search_path is the same as it was when
the plan was cached.

I haven't done any serious performance testing on this, but it gives
circa 2X speedup on Pavel's original example, which is at least
fairly close to the results that Amit's patch got there.  And it
makes this last batch of test cases faster not slower, too.

With this patch, perf shows the hotspots on Pavel's original example
as being

+   19.24%    19.17%         46470  postmaster       plpgsql.so                   [.] exec_eval_expr
+   15.19%    15.15%         36720  postmaster       plpgsql.so                   [.] plpgsql_param_eval_var
+   14.98%    14.94%         36213  postmaster       postgres                     [.] ExecInterpExpr
+    6.32%     6.30%         15262  postmaster       plpgsql.so                   [.] exec_stmt
+    6.08%     6.06%         14681  postmaster       plpgsql.so                   [.] exec_assign_value

Maybe there's more that could be done to knock fat out of
exec_eval_expr and/or plpgsql_param_eval_var, but at least
the plan cache isn't the bottleneck anymore.

            regards, tom lane

diff --git a/src/pl/plpgsql/src/Makefile b/src/pl/plpgsql/src/Makefile
index 70a9c34..193df8a 100644
--- a/src/pl/plpgsql/src/Makefile
+++ b/src/pl/plpgsql/src/Makefile
@@ -33,8 +33,8 @@ DATA = plpgsql.control plpgsql--1.0.sql
 REGRESS_OPTS = --dbname=$(PL_TESTDB)

 REGRESS = plpgsql_call plpgsql_control plpgsql_copy plpgsql_domain \
-    plpgsql_record plpgsql_cache plpgsql_transaction plpgsql_trap \
-    plpgsql_trigger plpgsql_varprops
+    plpgsql_record plpgsql_cache plpgsql_simple plpgsql_transaction \
+    plpgsql_trap plpgsql_trigger plpgsql_varprops

 # where to find gen_keywordlist.pl and subsidiary files
 TOOLSDIR = $(top_srcdir)/src/tools
diff --git a/src/pl/plpgsql/src/expected/plpgsql_simple.out b/src/pl/plpgsql/src/expected/plpgsql_simple.out
new file mode 100644
index 0000000..5a2fefa
--- /dev/null
+++ b/src/pl/plpgsql/src/expected/plpgsql_simple.out
@@ -0,0 +1,68 @@
+--
+-- Tests for plpgsql's handling of "simple" expressions
+--
+-- Check that changes to an inline-able function are handled correctly
+create function simplesql(int) returns int language sql
+as 'select $1';
+create function simplecaller() returns int language plpgsql
+as $$
+declare
+  sum int := 0;
+begin
+  for n in 1..10 loop
+    sum := sum + simplesql(n);
+    if n = 5 then
+      create or replace function simplesql(int) returns int language sql
+      as 'select $1 + 100';
+    end if;
+  end loop;
+  return sum;
+end$$;
+select simplecaller();
+ simplecaller
+--------------
+          555
+(1 row)
+
+-- Check that changes in search path are dealt with correctly
+create schema simple1;
+create function simple1.simpletarget(int) returns int language plpgsql
+as $$begin return $1; end$$;
+create function simpletarget(int) returns int language plpgsql
+as $$begin return $1 + 100; end$$;
+create or replace function simplecaller() returns int language plpgsql
+as $$
+declare
+  sum int := 0;
+begin
+  for n in 1..10 loop
+    sum := sum + simpletarget(n);
+    if n = 5 then
+      set local search_path = 'simple1';
+    end if;
+  end loop;
+  return sum;
+end$$;
+select simplecaller();
+ simplecaller
+--------------
+          555
+(1 row)
+
+-- try it with non-volatile functions, too
+alter function simple1.simpletarget(int) immutable;
+alter function simpletarget(int) immutable;
+select simplecaller();
+ simplecaller
+--------------
+          555
+(1 row)
+
+-- make sure flushing local caches changes nothing
+\c -
+select simplecaller();
+ simplecaller
+--------------
+          555
+(1 row)
+
diff --git a/src/pl/plpgsql/src/sql/plpgsql_simple.sql b/src/pl/plpgsql/src/sql/plpgsql_simple.sql
new file mode 100644
index 0000000..8a95768
--- /dev/null
+++ b/src/pl/plpgsql/src/sql/plpgsql_simple.sql
@@ -0,0 +1,61 @@
+--
+-- Tests for plpgsql's handling of "simple" expressions
+--
+
+-- Check that changes to an inline-able function are handled correctly
+create function simplesql(int) returns int language sql
+as 'select $1';
+
+create function simplecaller() returns int language plpgsql
+as $$
+declare
+  sum int := 0;
+begin
+  for n in 1..10 loop
+    sum := sum + simplesql(n);
+    if n = 5 then
+      create or replace function simplesql(int) returns int language sql
+      as 'select $1 + 100';
+    end if;
+  end loop;
+  return sum;
+end$$;
+
+select simplecaller();
+
+
+-- Check that changes in search path are dealt with correctly
+create schema simple1;
+
+create function simple1.simpletarget(int) returns int language plpgsql
+as $$begin return $1; end$$;
+
+create function simpletarget(int) returns int language plpgsql
+as $$begin return $1 + 100; end$$;
+
+create or replace function simplecaller() returns int language plpgsql
+as $$
+declare
+  sum int := 0;
+begin
+  for n in 1..10 loop
+    sum := sum + simpletarget(n);
+    if n = 5 then
+      set local search_path = 'simple1';
+    end if;
+  end loop;
+  return sum;
+end$$;
+
+select simplecaller();
+
+-- try it with non-volatile functions, too
+alter function simple1.simpletarget(int) immutable;
+alter function simpletarget(int) immutable;
+
+select simplecaller();
+
+-- make sure flushing local caches changes nothing
+\c -
+
+select simplecaller();
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index dbae18d..8e27b03 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1278,6 +1278,160 @@ ReleaseCachedPlan(CachedPlan *plan, bool useResOwner)
 }

 /*
+ * CachedPlanAllowsSimpleValidityCheck: can we use CachedPlanIsSimplyValid?
+ *
+ * This function, together with CachedPlanIsSimplyValid, provides a fast path
+ * for revalidating "simple" generic plans.  The core requirement to be simple
+ * is that the plan must not require taking any locks, which translates to
+ * not touching any tables; this happens to match up well with an important
+ * use-case in PL/pgSQL.  This function tests whether that's true, along
+ * with checking some other corner cases that we'd rather not bother with
+ * handling in the fast path.  (Note that it's still possible for such a plan
+ * to be invalidated, for example due to a change in a function that was
+ * inlined into the plan.)
+ *
+ * This must only be called on known-valid generic plans (eg, ones just
+ * returned by GetCachedPlan).  If it returns true, the caller may re-use
+ * the cached plan as long as CachedPlanIsSimplyValid returns true; that
+ * check is much cheaper than the full revalidation done by GetCachedPlan.
+ * Nonetheless, no required checks are omitted.
+ */
+bool
+CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
+                                    CachedPlan *plan)
+{
+    ListCell   *lc;
+
+    /* Sanity-check that the caller gave us a validated generic plan. */
+    Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
+    Assert(plan->magic == CACHEDPLAN_MAGIC);
+    Assert(plansource->is_valid);
+    Assert(plan->is_valid);
+    Assert(plan == plansource->gplan);
+
+    /* We don't support oneshot plans here. */
+    if (plansource->is_oneshot)
+        return false;
+    Assert(!plan->is_oneshot);
+
+    /*
+     * If the plan is dependent on RLS considerations, or it's transient,
+     * reject.  These things probably can't ever happen for table-free
+     * queries, but for safety's sake let's check.
+     */
+    if (plansource->dependsOnRLS)
+        return false;
+    if (plan->dependsOnRole)
+        return false;
+    if (TransactionIdIsValid(plan->saved_xmin))
+        return false;
+
+    /*
+     * Reject if AcquirePlannerLocks would have anything to do.  This is
+     * simplistic, but there's no need to inquire any more carefully; indeed,
+     * for current callers it shouldn't even be possible to hit any of these
+     * checks.
+     */
+    foreach(lc, plansource->query_list)
+    {
+        Query       *query = lfirst_node(Query, lc);
+
+        if (query->commandType == CMD_UTILITY)
+            return false;
+        if (query->rtable || query->cteList || query->hasSubLinks)
+            return false;
+    }
+
+    /*
+     * Reject if AcquireExecutorLocks would have anything to do.  This is
+     * probably unnecessary given the previous check, but let's be safe.
+     */
+    foreach(lc, plan->stmt_list)
+    {
+        PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
+        ListCell   *lc2;
+
+        if (plannedstmt->commandType == CMD_UTILITY)
+            return false;
+
+        /*
+         * We have to grovel through the rtable because it's likely to contain
+         * an RTE_RESULT relation, rather than being totally empty.
+         */
+        foreach(lc2, plannedstmt->rtable)
+        {
+            RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+
+            if (rte->rtekind == RTE_RELATION)
+                return false;
+        }
+    }
+
+    /*
+     * Okay, it's simple.  Note that what we've primarily established here is
+     * that no locks need be taken before checking the plan's is_valid flag.
+     */
+    return true;
+}
+
+/*
+ * CachedPlanIsSimplyValid: quick check for plan still being valid
+ *
+ * This function must not be used unless CachedPlanAllowsSimpleValidityCheck
+ * previously said it was OK.
+ *
+ * If the plan is valid, and "owner" is not NULL, record a refcount on
+ * the plan in that resowner before returning.  It is caller's responsibility
+ * to be sure that a refcount is held on any plan that's being actively used.
+ *
+ * The code here is unconditionally safe as long as the only use of this
+ * CachedPlanSource is in connection with the particular CachedPlan pointer
+ * that's passed in.  If the plansource were being used for other purposes,
+ * it's possible that its generic plan could be invalidated and regenerated
+ * while the current caller wasn't looking, and then there could be a chance
+ * collision of address between this caller's now-stale plan pointer and the
+ * actual address of the new generic plan.  For current uses, that scenario
+ * can't happen; but with a plansource shared across multiple uses, it'd be
+ * advisable to also save plan->generation and verify that that still matches.
+ */
+bool
+CachedPlanIsSimplyValid(CachedPlanSource *plansource, CachedPlan *plan,
+                        ResourceOwner owner)
+{
+    /*
+     * Careful here: since the caller doesn't necessarily hold a refcount on
+     * the plan to start with, it's possible that "plan" is a dangling
+     * pointer.  Don't dereference it until we've verified that it still
+     * matches the plansource's gplan (which is either valid or NULL).
+     */
+    Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
+
+    /*
+     * Has cache invalidation fired on this plan?  We can check this right
+     * away since there are no locks that we'd need to acquire first.
+     */
+    if (!plansource->is_valid || plan != plansource->gplan || !plan->is_valid)
+        return false;
+
+    Assert(plan->magic == CACHEDPLAN_MAGIC);
+
+    /* Is the search_path still the same as when we made it? */
+    Assert(plansource->search_path != NULL);
+    if (!OverrideSearchPathMatchesCurrent(plansource->search_path))
+        return false;
+
+    /* It's still good.  Bump refcount if requested. */
+    if (owner)
+    {
+        ResourceOwnerEnlargePlanCacheRefs(owner);
+        plan->refcount++;
+        ResourceOwnerRememberPlanCacheRef(owner, plan);
+    }
+
+    return true;
+}
+
+/*
  * CachedPlanSetParentContext: move a CachedPlanSource to a new memory context
  *
  * This can only be applied to unsaved plans; once saved, a plan always
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 6a5953c..037edd0 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -20,6 +20,8 @@
 #include "nodes/params.h"
 #include "tcop/cmdtag.h"
 #include "utils/queryenvironment.h"
+#include "utils/resowner.h"
+

 /* Forward declaration, to avoid including parsenodes.h here */
 struct RawStmt;
@@ -220,6 +222,12 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
                                  QueryEnvironment *queryEnv);
 extern void ReleaseCachedPlan(CachedPlan *plan, bool useResOwner);

+extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
+                                                CachedPlan *plan);
+extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
+                                    CachedPlan *plan,
+                                    ResourceOwner owner);
+
 extern CachedExpression *GetCachedExpression(Node *expr);
 extern void FreeCachedExpression(CachedExpression *cexpr);

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d3ad4fa..ef33394 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -96,6 +96,15 @@ static EState *shared_simple_eval_estate = NULL;
 static SimpleEcontextStackEntry *simple_econtext_stack = NULL;

 /*
+ * In addition to the shared simple-eval EState, we have a shared resource
+ * owner that holds refcounts on the CachedPlans for any "simple" expressions
+ * we have evaluated in the current transaction.  This allows us to avoid
+ * continually grabbing and releasing a plan refcount when a simple expression
+ * is used over and over.
+ */
+static ResourceOwner shared_simple_eval_resowner = NULL;
+
+/*
  * Memory management within a plpgsql function generally works with three
  * contexts:
  *
@@ -314,7 +323,8 @@ static int    exec_stmt_set(PLpgSQL_execstate *estate,
 static void plpgsql_estate_setup(PLpgSQL_execstate *estate,
                                  PLpgSQL_function *func,
                                  ReturnSetInfo *rsi,
-                                 EState *simple_eval_estate);
+                                 EState *simple_eval_estate,
+                                 ResourceOwner simple_eval_resowner);
 static void exec_eval_cleanup(PLpgSQL_execstate *estate);

 static void exec_prepare_plan(PLpgSQL_execstate *estate,
@@ -440,16 +450,19 @@ static char *format_preparedparamsdata(PLpgSQL_execstate *estate,
  *
  * This is also used to execute inline code blocks (DO blocks).  The only
  * difference that this code is aware of is that for a DO block, we want
- * to use a private simple_eval_estate, which is created and passed in by
- * the caller.  For regular functions, pass NULL, which implies using
- * shared_simple_eval_estate.  (When using a private simple_eval_estate,
+ * to use a private simple_eval_estate and a private simple_eval_resowner,
+ * which are created and passed in by the caller.  For regular functions,
+ * pass NULL, which implies using shared_simple_eval_estate and
+ * shared_simple_eval_resowner.  (When using a private simple_eval_estate,
  * we must also use a private cast hashtable, but that's taken care of
  * within plpgsql_estate_setup.)
  * ----------
  */
 Datum
 plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo,
-                      EState *simple_eval_estate, bool atomic)
+                      EState *simple_eval_estate,
+                      ResourceOwner simple_eval_resowner,
+                      bool atomic)
 {
     PLpgSQL_execstate estate;
     ErrorContextCallback plerrcontext;
@@ -460,7 +473,7 @@ plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo,
      * Setup the execution state
      */
     plpgsql_estate_setup(&estate, func, (ReturnSetInfo *) fcinfo->resultinfo,
-                         simple_eval_estate);
+                         simple_eval_estate, simple_eval_resowner);
     estate.atomic = atomic;

     /*
@@ -897,7 +910,7 @@ plpgsql_exec_trigger(PLpgSQL_function *func,
     /*
      * Setup the execution state
      */
-    plpgsql_estate_setup(&estate, func, NULL, NULL);
+    plpgsql_estate_setup(&estate, func, NULL, NULL, NULL);
     estate.trigdata = trigdata;

     /*
@@ -1135,7 +1148,7 @@ plpgsql_exec_event_trigger(PLpgSQL_function *func, EventTriggerData *trigdata)
     /*
      * Setup the execution state
      */
-    plpgsql_estate_setup(&estate, func, NULL, NULL);
+    plpgsql_estate_setup(&estate, func, NULL, NULL, NULL);
     estate.evtrigdata = trigdata;

     /*
@@ -2319,6 +2332,7 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
          * some internal state.
          */
         estate->simple_eval_estate = NULL;
+        estate->simple_eval_resowner = NULL;
         plpgsql_create_econtext(estate);
     }

@@ -3874,7 +3888,8 @@ static void
 plpgsql_estate_setup(PLpgSQL_execstate *estate,
                      PLpgSQL_function *func,
                      ReturnSetInfo *rsi,
-                     EState *simple_eval_estate)
+                     EState *simple_eval_estate,
+                     ResourceOwner simple_eval_resowner)
 {
     HASHCTL        ctl;

@@ -3965,6 +3980,11 @@ plpgsql_estate_setup(PLpgSQL_execstate *estate,
         estate->cast_hash = shared_cast_hash;
         estate->cast_hash_context = shared_cast_context;
     }
+    /* likewise for the simple-expression resource owner */
+    if (simple_eval_resowner)
+        estate->simple_eval_resowner = simple_eval_resowner;
+    else
+        estate->simple_eval_resowner = shared_simple_eval_resowner;

     /*
      * We start with no stmt_mcontext; one will be created only if needed.
@@ -4825,6 +4845,7 @@ exec_stmt_commit(PLpgSQL_execstate *estate, PLpgSQL_stmt_commit *stmt)
     }

     estate->simple_eval_estate = NULL;
+    estate->simple_eval_resowner = NULL;
     plpgsql_create_econtext(estate);

     return PLPGSQL_RC_OK;
@@ -4847,6 +4868,7 @@ exec_stmt_rollback(PLpgSQL_execstate *estate, PLpgSQL_stmt_rollback *stmt)
     }

     estate->simple_eval_estate = NULL;
+    estate->simple_eval_resowner = NULL;
     plpgsql_create_econtext(estate);

     return PLPGSQL_RC_OK;
@@ -6059,8 +6081,6 @@ loop_exit:
  * someone might redefine a SQL function that had been inlined into the simple
  * expression.  That cannot cause a simple expression to become non-simple (or
  * vice versa), but we do have to handle replacing the expression tree.
- * Fortunately it's normally inexpensive to call SPI_plan_get_cached_plan for
- * a simple expression.
  *
  * Note: if pass-by-reference, the result is in the eval_mcontext.
  * It will be freed when exec_eval_cleanup is done.
@@ -6076,7 +6096,7 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
 {
     ExprContext *econtext = estate->eval_econtext;
     LocalTransactionId curlxid = MyProc->lxid;
-    CachedPlan *cplan;
+    ParamListInfo paramLI;
     void       *save_setup_arg;
     bool        need_snapshot;
     MemoryContext oldcontext;
@@ -6090,29 +6110,92 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
     /*
      * If expression is in use in current xact, don't touch it.
      */
-    if (expr->expr_simple_in_use && expr->expr_simple_lxid == curlxid)
+    if (unlikely(expr->expr_simple_in_use) &&
+        expr->expr_simple_lxid == curlxid)
         return false;

     /*
-     * Revalidate cached plan, so that we will notice if it became stale. (We
-     * need to hold a refcount while using the plan, anyway.)  If replanning
-     * is needed, do that work in the eval_mcontext.
+     * Check to see if the cached plan has been invalidated.  If not, and this
+     * is the first use in the current transaction, save a plan refcount in
+     * the simple-expression resowner.
      */
-    oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate));
-    cplan = SPI_plan_get_cached_plan(expr->plan);
-    MemoryContextSwitchTo(oldcontext);
+    if (likely(CachedPlanIsSimplyValid(expr->expr_simple_plansource,
+                                       expr->expr_simple_plan,
+                                       (expr->expr_simple_plan_lxid != curlxid ?
+                                        estate->simple_eval_resowner : NULL))))
+    {
+        /*
+         * It's still good, so just remember that we have a refcount on the
+         * plan in the current transaction.  (If we already had one, this
+         * assignment is a no-op.)
+         */
+        expr->expr_simple_plan_lxid = curlxid;
+    }
+    else
+    {
+        /* Need to replan */
+        CachedPlan *cplan;

-    /*
-     * We can't get a failure here, because the number of CachedPlanSources in
-     * the SPI plan can't change from what exec_simple_check_plan saw; it's a
-     * property of the raw parsetree generated from the query text.
-     */
-    Assert(cplan != NULL);
+        /*
+         * If we have a valid refcount on some previous version of the plan,
+         * release it, so we don't leak plans intra-transaction.
+         */
+        if (expr->expr_simple_plan_lxid == curlxid)
+        {
+            ResourceOwner saveResourceOwner = CurrentResourceOwner;

-    /* If it got replanned, update our copy of the simple expression */
-    if (cplan->generation != expr->expr_simple_generation)
-    {
+            CurrentResourceOwner = estate->simple_eval_resowner;
+            ReleaseCachedPlan(expr->expr_simple_plan, true);
+            CurrentResourceOwner = saveResourceOwner;
+            expr->expr_simple_plan = NULL;
+            expr->expr_simple_plan_lxid = InvalidLocalTransactionId;
+        }
+
+        /* Do the replanning work in the eval_mcontext */
+        oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate));
+        cplan = SPI_plan_get_cached_plan(expr->plan);
+        MemoryContextSwitchTo(oldcontext);
+
+        /*
+         * We can't get a failure here, because the number of
+         * CachedPlanSources in the SPI plan can't change from what
+         * exec_simple_check_plan saw; it's a property of the raw parsetree
+         * generated from the query text.
+         */
+        Assert(cplan != NULL);
+
+        /*
+         * These tests probably can't fail either, but if they do, cope by
+         * declaring the plan to be non-simple.  On success, we'll acquire a
+         * refcount on the new plan, stored in simple_eval_resowner.
+         */
+        if (CachedPlanAllowsSimpleValidityCheck(expr->expr_simple_plansource,
+                                                cplan) &&
+            CachedPlanIsSimplyValid(expr->expr_simple_plansource, cplan,
+                                    estate->simple_eval_resowner))
+        {
+            /* Remember that we have the refcount */
+            expr->expr_simple_plan = cplan;
+            expr->expr_simple_plan_lxid = curlxid;
+        }
+        else
+        {
+            /* Release SPI_plan_get_cached_plan's refcount */
+            ReleaseCachedPlan(cplan, true);
+            /* Mark expression as non-simple, and fail */
+            expr->expr_simple_expr = NULL;
+            return false;
+        }
+
+        /*
+         * SPI_plan_get_cached_plan acquired a plan refcount stored in the
+         * active resowner.  We don't need that anymore, so release it.
+         */
+        ReleaseCachedPlan(cplan, true);
+
+        /* Extract desired scalar expression from cached plan */
         exec_save_simple_expr(expr, cplan);
+
         /* better recheck r/w safety, as it could change due to inlining */
         if (expr->rwparam >= 0)
             exec_check_rw_parameter(expr, expr->rwparam);
@@ -6128,16 +6211,24 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
      * Set up ParamListInfo to pass to executor.  For safety, save and restore
      * estate->paramLI->parserSetupArg around our use of the param list.
      */
-    save_setup_arg = estate->paramLI->parserSetupArg;
+    paramLI = estate->paramLI;
+    save_setup_arg = paramLI->parserSetupArg;

-    econtext->ecxt_param_list_info = setup_param_list(estate, expr);
+    /*
+     * We can skip using setup_param_list() in favor of just doing this
+     * unconditionally, because there's no need for the optimization of
+     * possibly setting ecxt_param_list_info to NULL; we've already forced use
+     * of a generic plan.
+     */
+    paramLI->parserSetupArg = (void *) expr;
+    econtext->ecxt_param_list_info = paramLI;

     /*
      * Prepare the expression for execution, if it's not been done already in
      * the current transaction.  (This will be forced to happen if we called
      * exec_save_simple_expr above.)
      */
-    if (expr->expr_simple_lxid != curlxid)
+    if (unlikely(expr->expr_simple_lxid != curlxid))
     {
         oldcontext = MemoryContextSwitchTo(estate->simple_eval_estate->es_query_cxt);
         expr->expr_simple_state =
@@ -6185,7 +6276,7 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,

     econtext->ecxt_param_list_info = NULL;

-    estate->paramLI->parserSetupArg = save_setup_arg;
+    paramLI->parserSetupArg = save_setup_arg;

     if (need_snapshot)
         PopActiveSnapshot();
@@ -6193,11 +6284,6 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
     MemoryContextSwitchTo(oldcontext);

     /*
-     * Now we can release our refcount on the cached plan.
-     */
-    ReleaseCachedPlan(cplan, true);
-
-    /*
      * That's it.
      */
     return true;
@@ -7984,10 +8070,35 @@ exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr)
     /* Can't fail, because we checked for a single CachedPlanSource above */
     Assert(cplan != NULL);

-    /* Share the remaining work with replan code path */
-    exec_save_simple_expr(expr, cplan);
+    /*
+     * Verify that plancache.c thinks the plan is simple enough to use
+     * CachedPlanIsSimplyValid.  Given the restrictions above, it's unlikely
+     * that this could fail, but if it does, just treat plan as not simple.
+     */
+    if (CachedPlanAllowsSimpleValidityCheck(plansource, cplan))
+    {
+        /*
+         * OK, use CachedPlanIsSimplyValid to save a refcount on the plan in
+         * the simple-expression resowner.  This shouldn't fail either, but if
+         * somehow it does, again we can cope by treating plan as not simple.
+         */
+        if (CachedPlanIsSimplyValid(plansource, cplan,
+                                    estate->simple_eval_resowner))
+        {
+            /* Remember that we have the refcount */
+            expr->expr_simple_plansource = plansource;
+            expr->expr_simple_plan = cplan;
+            expr->expr_simple_plan_lxid = MyProc->lxid;
+
+            /* Share the remaining work with the replan code path */
+            exec_save_simple_expr(expr, cplan);
+        }
+    }

-    /* Release our plan refcount */
+    /*
+     * Release the plan refcount obtained by SPI_plan_get_cached_plan.  (This
+     * refcount is held by the wrong resowner, so we can't just repurpose it.)
+     */
     ReleaseCachedPlan(cplan, true);
 }

@@ -8060,7 +8171,6 @@ exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan)
      * current transaction".
      */
     expr->expr_simple_expr = tle_expr;
-    expr->expr_simple_generation = cplan->generation;
     expr->expr_simple_state = NULL;
     expr->expr_simple_in_use = false;
     expr->expr_simple_lxid = InvalidLocalTransactionId;
@@ -8196,7 +8306,7 @@ exec_set_found(PLpgSQL_execstate *estate, bool state)
  *
  * We may need to create a new shared_simple_eval_estate too, if there's not
  * one already for the current transaction.  The EState will be cleaned up at
- * transaction end.
+ * transaction end.  Ditto for shared_simple_eval_resowner.
  */
 static void
 plpgsql_create_econtext(PLpgSQL_execstate *estate)
@@ -8225,6 +8335,20 @@ plpgsql_create_econtext(PLpgSQL_execstate *estate)
     }

     /*
+     * Likewise for the simple-expression resource owner.  (Note: it'd be
+     * safer to create this as a child of TopTransactionResourceOwner; but
+     * right now that causes issues in transaction-spanning procedures, so
+     * make it standalone.)
+     */
+    if (estate->simple_eval_resowner == NULL)
+    {
+        if (shared_simple_eval_resowner == NULL)
+            shared_simple_eval_resowner =
+                ResourceOwnerCreate(NULL, "PL/pgSQL simple expressions");
+        estate->simple_eval_resowner = shared_simple_eval_resowner;
+    }
+
+    /*
      * Create a child econtext for the current function.
      */
     estate->eval_econtext = CreateExprContext(estate->simple_eval_estate);
@@ -8270,16 +8394,17 @@ plpgsql_destroy_econtext(PLpgSQL_execstate *estate)
  * plpgsql_xact_cb --- post-transaction-commit-or-abort cleanup
  *
  * If a simple-expression EState was created in the current transaction,
- * it has to be cleaned up.
+ * it has to be cleaned up.  The same for the simple-expression resowner.
  */
 void
 plpgsql_xact_cb(XactEvent event, void *arg)
 {
     /*
      * If we are doing a clean transaction shutdown, free the EState (so that
-     * any remaining resources will be released correctly). In an abort, we
+     * any remaining resources will be released correctly).  In an abort, we
      * expect the regular abort recovery procedures to release everything of
-     * interest.
+     * interest.  The resowner has to be explicitly released in both cases,
+     * though, since it's not a child of TopTransactionResourceOwner.
      */
     if (event == XACT_EVENT_COMMIT || event == XACT_EVENT_PREPARE)
     {
@@ -8288,11 +8413,17 @@ plpgsql_xact_cb(XactEvent event, void *arg)
         if (shared_simple_eval_estate)
             FreeExecutorState(shared_simple_eval_estate);
         shared_simple_eval_estate = NULL;
+        if (shared_simple_eval_resowner)
+            plpgsql_free_simple_resowner(shared_simple_eval_resowner);
+        shared_simple_eval_resowner = NULL;
     }
     else if (event == XACT_EVENT_ABORT)
     {
         simple_econtext_stack = NULL;
         shared_simple_eval_estate = NULL;
+        if (shared_simple_eval_resowner)
+            plpgsql_free_simple_resowner(shared_simple_eval_resowner);
+        shared_simple_eval_resowner = NULL;
     }
 }

@@ -8324,6 +8455,33 @@ plpgsql_subxact_cb(SubXactEvent event, SubTransactionId mySubid,
 }

 /*
+ * plpgsql_free_simple_resowner --- free a simple-expression ResourceOwner
+ */
+void
+plpgsql_free_simple_resowner(ResourceOwner simple_eval_resowner)
+{
+    /*
+     * At this writing, the only thing that could actually get released is
+     * plancache refcounts; but we may as well do the full release protocol.
+     *
+     * We pass isCommit = false even when committing, to suppress
+     * resource-leakage gripes, since we aren't bothering to release the
+     * refcounts one-by-one.
+     */
+    ResourceOwnerRelease(simple_eval_resowner,
+                         RESOURCE_RELEASE_BEFORE_LOCKS,
+                         false, true);
+    ResourceOwnerRelease(simple_eval_resowner,
+                         RESOURCE_RELEASE_LOCKS,
+                         false, true);
+    ResourceOwnerRelease(simple_eval_resowner,
+                         RESOURCE_RELEASE_AFTER_LOCKS,
+                         false, true);
+
+    ResourceOwnerDelete(simple_eval_resowner);
+}
+
+/*
  * assign_simple_var --- assign a new value to any VAR datum.
  *
  * This should be the only mechanism for assignment to simple variables,
diff --git a/src/pl/plpgsql/src/pl_handler.c b/src/pl/plpgsql/src/pl_handler.c
index b434818..4cc6079 100644
--- a/src/pl/plpgsql/src/pl_handler.c
+++ b/src/pl/plpgsql/src/pl_handler.c
@@ -262,7 +262,9 @@ plpgsql_call_handler(PG_FUNCTION_ARGS)
             retval = (Datum) 0;
         }
         else
-            retval = plpgsql_exec_function(func, fcinfo, NULL, !nonatomic);
+            retval = plpgsql_exec_function(func, fcinfo,
+                                           NULL, NULL,
+                                           !nonatomic);
     }
     PG_FINALLY();
     {
@@ -297,6 +299,7 @@ plpgsql_inline_handler(PG_FUNCTION_ARGS)
     PLpgSQL_function *func;
     FmgrInfo    flinfo;
     EState       *simple_eval_estate;
+    ResourceOwner simple_eval_resowner;
     Datum        retval;
     int            rc;

@@ -323,22 +326,27 @@ plpgsql_inline_handler(PG_FUNCTION_ARGS)
     flinfo.fn_oid = InvalidOid;
     flinfo.fn_mcxt = CurrentMemoryContext;

-    /* Create a private EState for simple-expression execution */
+    /* Create a private EState and resowner for simple-expression execution */
     simple_eval_estate = CreateExecutorState();
+    simple_eval_resowner =
+        ResourceOwnerCreate(NULL, "PL/pgSQL DO block simple expressions");

     /* And run the function */
     PG_TRY();
     {
-        retval = plpgsql_exec_function(func, fake_fcinfo, simple_eval_estate, codeblock->atomic);
+        retval = plpgsql_exec_function(func, fake_fcinfo,
+                                       simple_eval_estate,
+                                       simple_eval_resowner,
+                                       codeblock->atomic);
     }
     PG_CATCH();
     {
         /*
          * We need to clean up what would otherwise be long-lived resources
          * accumulated by the failed DO block, principally cached plans for
-         * statements (which can be flushed with plpgsql_free_function_memory)
-         * and execution trees for simple expressions, which are in the
-         * private EState.
+         * statements (which can be flushed by plpgsql_free_function_memory),
+         * execution trees for simple expressions, which are in the private
+         * EState, and cached-plan refcounts held by the private resowner.
          *
          * Before releasing the private EState, we must clean up any
          * simple_econtext_stack entries pointing into it, which we can do by
@@ -351,8 +359,9 @@ plpgsql_inline_handler(PG_FUNCTION_ARGS)
                            GetCurrentSubTransactionId(),
                            0, NULL);

-        /* Clean up the private EState */
+        /* Clean up the private EState and resowner */
         FreeExecutorState(simple_eval_estate);
+        plpgsql_free_simple_resowner(simple_eval_resowner);

         /* Function should now have no remaining use-counts ... */
         func->use_count--;
@@ -366,8 +375,9 @@ plpgsql_inline_handler(PG_FUNCTION_ARGS)
     }
     PG_END_TRY();

-    /* Clean up the private EState */
+    /* Clean up the private EState and resowner */
     FreeExecutorState(simple_eval_estate);
+    plpgsql_free_simple_resowner(simple_eval_resowner);

     /* Function should now have no remaining use-counts ... */
     func->use_count--;
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 69df330..410dcde 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -231,12 +231,21 @@ typedef struct PLpgSQL_expr

     /* fields for "simple expression" fast-path execution: */
     Expr       *expr_simple_expr;    /* NULL means not a simple expr */
-    int            expr_simple_generation; /* plancache generation we checked */
     Oid            expr_simple_type;    /* result type Oid, if simple */
     int32        expr_simple_typmod; /* result typmod, if simple */
     bool        expr_simple_mutable;    /* true if simple expr is mutable */

     /*
+     * If the expression was ever determined to be simple, we remember its
+     * CachedPlanSource and CachedPlan here.  If expr_simple_plan_lxid matches
+     * current LXID, then we hold a refcount on expr_simple_plan in the
+     * current transaction.  Otherwise we need to get one before re-using it.
+     */
+    CachedPlanSource *expr_simple_plansource;    /* extracted from "plan" */
+    CachedPlan *expr_simple_plan;    /* extracted from "plan" */
+    LocalTransactionId expr_simple_plan_lxid;
+
+    /*
      * if expr is simple AND prepared in current transaction,
      * expr_simple_state and expr_simple_in_use are valid. Test validity by
      * seeing if expr_simple_lxid matches current LXID.  (If not,
@@ -1082,8 +1091,9 @@ typedef struct PLpgSQL_execstate
      */
     ParamListInfo paramLI;

-    /* EState to use for "simple" expression evaluation */
+    /* EState and resowner to use for "simple" expression evaluation */
     EState       *simple_eval_estate;
+    ResourceOwner simple_eval_resowner;

     /* lookup table to use for executing type casts */
     HTAB       *cast_hash;
@@ -1268,6 +1278,7 @@ extern void _PG_init(void);
 extern Datum plpgsql_exec_function(PLpgSQL_function *func,
                                    FunctionCallInfo fcinfo,
                                    EState *simple_eval_estate,
+                                   ResourceOwner simple_eval_resowner,
                                    bool atomic);
 extern HeapTuple plpgsql_exec_trigger(PLpgSQL_function *func,
                                       TriggerData *trigdata);
@@ -1276,6 +1287,7 @@ extern void plpgsql_exec_event_trigger(PLpgSQL_function *func,
 extern void plpgsql_xact_cb(XactEvent event, void *arg);
 extern void plpgsql_subxact_cb(SubXactEvent event, SubTransactionId mySubid,
                                SubTransactionId parentSubid, void *arg);
+extern void plpgsql_free_simple_resowner(ResourceOwner simple_eval_resowner);
 extern Oid    plpgsql_exec_get_datum_type(PLpgSQL_execstate *estate,
                                         PLpgSQL_datum *datum);
 extern void plpgsql_exec_get_datum_type_info(PLpgSQL_execstate *estate,
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 5ff7824..2ec2301 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -126,6 +126,11 @@
  * namespaceUser is the userid the path has been computed for.
  *
  * Note: all data pointed to by these List variables is in TopMemoryContext.
+ *
+ * activePathGeneration is incremented whenever the effective values of
+ * activeSearchPath/activeCreationNamespace/activeTempCreationPending change.
+ * This can be used to quickly detect whether any change has happened since
+ * a previous examination of the search path state.
  */

 /* These variables define the actually active state: */
@@ -138,6 +143,9 @@ static Oid    activeCreationNamespace = InvalidOid;
 /* if true, activeCreationNamespace is wrong, it should be temp namespace */
 static bool activeTempCreationPending = false;

+/* current generation counter; make sure this is never zero */
+static uint64 activePathGeneration = 1;
+
 /* These variables are the values last derived from namespace_search_path: */

 static List *baseSearchPath = NIL;
@@ -3373,6 +3381,7 @@ GetOverrideSearchPath(MemoryContext context)
         schemas = list_delete_first(schemas);
     }
     result->schemas = schemas;
+    result->generation = activePathGeneration;

     MemoryContextSwitchTo(oldcxt);

@@ -3393,12 +3402,18 @@ CopyOverrideSearchPath(OverrideSearchPath *path)
     result->schemas = list_copy(path->schemas);
     result->addCatalog = path->addCatalog;
     result->addTemp = path->addTemp;
+    result->generation = path->generation;

     return result;
 }

 /*
  * OverrideSearchPathMatchesCurrent - does path match current setting?
+ *
+ * This is tested over and over in some common code paths, and in the typical
+ * scenario where the active search path seldom changes, it'll always succeed.
+ * We make that case fast by keeping a generation counter that is advanced
+ * whenever the active search path changes.
  */
 bool
 OverrideSearchPathMatchesCurrent(OverrideSearchPath *path)
@@ -3408,6 +3423,10 @@ OverrideSearchPathMatchesCurrent(OverrideSearchPath *path)

     recomputeNamespacePath();

+    /* Quick out if already known equal to active path. */
+    if (path->generation == activePathGeneration)
+        return true;
+
     /* We scan down the activeSearchPath to see if it matches the input. */
     lc = list_head(activeSearchPath);

@@ -3440,6 +3459,13 @@ OverrideSearchPathMatchesCurrent(OverrideSearchPath *path)
     }
     if (lc)
         return false;
+
+    /*
+     * Update path->generation so that future tests will return quickly, so
+     * long as the active search path doesn't change.
+     */
+    path->generation = activePathGeneration;
+
     return true;
 }

@@ -3510,6 +3536,14 @@ PushOverrideSearchPath(OverrideSearchPath *newpath)
     activeCreationNamespace = entry->creationNamespace;
     activeTempCreationPending = false;    /* XXX is this OK? */

+    /*
+     * We always increment activePathGeneration when pushing/popping an
+     * override path.  In current usage, these actions always change the
+     * effective path state, so there's no value in checking to see if it
+     * didn't change.
+     */
+    activePathGeneration++;
+
     MemoryContextSwitchTo(oldcxt);
 }

@@ -3551,6 +3585,9 @@ PopOverrideSearchPath(void)
         activeCreationNamespace = baseCreationNamespace;
         activeTempCreationPending = baseTempCreationPending;
     }
+
+    /* As above, the generation always increments. */
+    activePathGeneration++;
 }


@@ -3707,6 +3744,7 @@ recomputeNamespacePath(void)
     ListCell   *l;
     bool        temp_missing;
     Oid            firstNS;
+    bool        pathChanged;
     MemoryContext oldcxt;

     /* Do nothing if an override search spec is active. */
@@ -3814,18 +3852,31 @@ recomputeNamespacePath(void)
         oidlist = lcons_oid(myTempNamespace, oidlist);

     /*
-     * Now that we've successfully built the new list of namespace OIDs, save
-     * it in permanent storage.
+     * We want to detect the case where the effective value of the base search
+     * path variables didn't change.  As long as we're doing so, we can avoid
+     * copying the OID list unncessarily.
      */
-    oldcxt = MemoryContextSwitchTo(TopMemoryContext);
-    newpath = list_copy(oidlist);
-    MemoryContextSwitchTo(oldcxt);
+    if (baseCreationNamespace == firstNS &&
+        baseTempCreationPending == temp_missing &&
+        equal(oidlist, baseSearchPath))
+    {
+        pathChanged = false;
+    }
+    else
+    {
+        pathChanged = true;
+
+        /* Must save OID list in permanent storage. */
+        oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+        newpath = list_copy(oidlist);
+        MemoryContextSwitchTo(oldcxt);

-    /* Now safe to assign to state variables. */
-    list_free(baseSearchPath);
-    baseSearchPath = newpath;
-    baseCreationNamespace = firstNS;
-    baseTempCreationPending = temp_missing;
+        /* Now safe to assign to state variables. */
+        list_free(baseSearchPath);
+        baseSearchPath = newpath;
+        baseCreationNamespace = firstNS;
+        baseTempCreationPending = temp_missing;
+    }

     /* Mark the path valid. */
     baseSearchPathValid = true;
@@ -3836,6 +3887,16 @@ recomputeNamespacePath(void)
     activeCreationNamespace = baseCreationNamespace;
     activeTempCreationPending = baseTempCreationPending;

+    /*
+     * Bump the generation only if something actually changed.  (Notice that
+     * what we compared to was the old state of the base path variables; so
+     * this does not deal with the situation where we have just popped an
+     * override path and restored the prior state of the base path.  Instead
+     * we rely on the override-popping logic to have bumped the generation.)
+     */
+    if (pathChanged)
+        activePathGeneration++;
+
     /* Clean up. */
     pfree(rawname);
     list_free(namelist);
@@ -4054,6 +4115,8 @@ AtEOXact_Namespace(bool isCommit, bool parallel)
         activeSearchPath = baseSearchPath;
         activeCreationNamespace = baseCreationNamespace;
         activeTempCreationPending = baseTempCreationPending;
+        /* Always bump generation --- see note in recomputeNamespacePath */
+        activePathGeneration++;
     }
 }

@@ -4109,6 +4172,8 @@ AtEOSubXact_Namespace(bool isCommit, SubTransactionId mySubid,
         overrideStack = list_delete_first(overrideStack);
         list_free(entry->searchPath);
         pfree(entry);
+        /* Always bump generation --- see note in recomputeNamespacePath */
+        activePathGeneration++;
     }

     /* Activate the next level down. */
@@ -4118,6 +4183,12 @@ AtEOSubXact_Namespace(bool isCommit, SubTransactionId mySubid,
         activeSearchPath = entry->searchPath;
         activeCreationNamespace = entry->creationNamespace;
         activeTempCreationPending = false;    /* XXX is this OK? */
+
+        /*
+         * It's probably unnecessary to bump generation here, but this should
+         * not be a performance-critical case, so better to be over-cautious.
+         */
+        activePathGeneration++;
     }
     else
     {
@@ -4125,6 +4196,12 @@ AtEOSubXact_Namespace(bool isCommit, SubTransactionId mySubid,
         activeSearchPath = baseSearchPath;
         activeCreationNamespace = baseCreationNamespace;
         activeTempCreationPending = baseTempCreationPending;
+
+        /*
+         * If we popped an override stack entry, then we already bumped the
+         * generation above.  If we did not, then the above assignments did
+         * nothing and we need not bump the generation.
+         */
     }
 }

@@ -4264,6 +4341,7 @@ InitializeSearchPath(void)
         activeSearchPath = baseSearchPath;
         activeCreationNamespace = baseCreationNamespace;
         activeTempCreationPending = baseTempCreationPending;
+        activePathGeneration++; /* pro forma */
     }
     else
     {
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 3e3a192..2456c08 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -49,12 +49,17 @@ typedef enum TempNamespaceStatus

 /*
  *    Structure for xxxOverrideSearchPath functions
+ *
+ * The generation counter is private to namespace.c and shouldn't be touched
+ * by other code.  It can be initialized to zero if necessary (that means
+ * "not known equal to the current active path").
  */
 typedef struct OverrideSearchPath
 {
     List       *schemas;        /* OIDs of explicitly named schemas */
     bool        addCatalog;        /* implicitly prepend pg_catalog? */
     bool        addTemp;        /* implicitly prepend temp schema? */
+    uint64        generation;        /* for quick detection of equality to active */
 } OverrideSearchPath;

 /*

Re: plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

21 March 2020, 20:29:21

so 21. 3. 2020 v 19:24 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:
> So the patch has a problem with constant casting - unfortunately the mix of
> double precision variables and numeric constants is pretty often in
> Postgres.

Yeah. I believe the cause of that is that the patch thinks it can skip
passing an inline-function-free simple expression through the planner.
That's flat out wrong. Quite aside from failing to perform
constant-folding (which is presumably the cause of the slowdown you
spotted), that means that we miss performing such non-optional
transformations as rearranging named-function-argument notation into
positional order. I didn't bother to test that but I'm sure it can be
shown to lead to crashes.

Now that I've looked at the patch I don't like it one bit from a
structural standpoint either. It's basically trying to make an end
run around the plancache, which is not going to be maintainable even
if it correctly accounted for everything the plancache does today.
Which it doesn't. Two big problems are:

* It doesn't account for the possibility of search_path changes
affecting the interpretation of an expression.

* It assumes that the *only* things that a simple plan could get
invalidated for are functions that were inlined. This isn't the
case --- a counterexample is that removal of no-op CoerceToDomain
nodes requires the plan to be invalidated if the domain's constraints
change. And there are likely to be more such issues in future.

So while there's clearly something worth pursuing here, I do not like
anything about the way it was done. I think that the right way to
think about this problem is "how can the plan cache provide a fast
path for checking validity of simple-expression plans?". And when you
think about it that way, there's a pretty obvious answer: if the plan
involves no table references, there's not going to be any locks that
have to be taken before we can check the is_valid flag. So we can
have a fast path that skips AcquirePlannerLocks and
AcquireExecutorLocks, which are a big part of the problem, and we can
also bypass some of the other random checks that GetCachedPlan has to
make, like whether RLS affects the plan.

Another chunk of the issue is the constant acquisition and release of
reference counts on the plan. We can't really skip that (I suspect
there are additional bugs in Amit's patch arising from trying to do so).
However, plpgsql already has mechanisms for paying simple-expression
setup costs once per transaction rather than once per expression use.
So we can set up a simple-expression ResourceOwner managed much like
the simple-expression EState, and have it hold a refcount on the
CachedPlan for each simple expression, and pay that overhead just once
per transaction.

So I worked on those ideas for awhile, and came up with the attached
patchset:

0001 adds some regression tests in this area (Amit's patch fails the
tests concerning search_path changes).

0002 does what's suggested above. I also did a little bit of marginal
micro-tuning in exec_eval_simple_expr() itself.

0003 improves the biggest remaining cost of validity rechecking,
which is verifying that the search_path is the same as it was when
the plan was cached.

I haven't done any serious performance testing on this, but it gives
circa 2X speedup on Pavel's original example, which is at least
fairly close to the results that Amit's patch got there. And it
makes this last batch of test cases faster not slower, too.

With this patch, perf shows the hotspots on Pavel's original example
as being

+ 19.24% 19.17% 46470 postmaster plpgsql.so [.] exec_eval_expr
+ 15.19% 15.15% 36720 postmaster plpgsql.so [.] plpgsql_param_eval_var
+ 14.98% 14.94% 36213 postmaster postgres [.] ExecInterpExpr
+ 6.32% 6.30% 15262 postmaster plpgsql.so [.] exec_stmt
+ 6.08% 6.06% 14681 postmaster plpgsql.so [.] exec_assign_value

Maybe there's more that could be done to knock fat out of
exec_eval_expr and/or plpgsql_param_eval_var, but at least
the plan cache isn't the bottleneck anymore.

I tested Tom's patches, and I can confirm these results.

It doesn't break tests (and all tests plpgsql_check tests passed without problems).

The high overhead of ExecInterpExpr is related to prepare fcinfo, and checking nulls arguments because all functions are strict

plpgsql_param_eval_var, looks like expensive is var = (PLpgSQL_var *) estate->datums[dno] and *op->resvalue = var->value;

It looks great.

Pavel

regards, tom lane

Re: plan cache overhead on plpgsql expression

From

Pavel Stehule

Date:

25 March 2020, 19:14:14

so 21. 3. 2020 v 21:29 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

so 21. 3. 2020 v 19:24 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> So the patch has a problem with constant casting - unfortunately the mix of
> double precision variables and numeric constants is pretty often in
> Postgres.

Yeah. I believe the cause of that is that the patch thinks it can skip
passing an inline-function-free simple expression through the planner.
That's flat out wrong. Quite aside from failing to perform
constant-folding (which is presumably the cause of the slowdown you
spotted), that means that we miss performing such non-optional
transformations as rearranging named-function-argument notation into
positional order. I didn't bother to test that but I'm sure it can be
shown to lead to crashes.

Now that I've looked at the patch I don't like it one bit from a
structural standpoint either. It's basically trying to make an end
run around the plancache, which is not going to be maintainable even
if it correctly accounted for everything the plancache does today.
Which it doesn't. Two big problems are:

* It doesn't account for the possibility of search_path changes
affecting the interpretation of an expression.

* It assumes that the *only* things that a simple plan could get
invalidated for are functions that were inlined. This isn't the
case --- a counterexample is that removal of no-op CoerceToDomain
nodes requires the plan to be invalidated if the domain's constraints
change. And there are likely to be more such issues in future.

So while there's clearly something worth pursuing here, I do not like
anything about the way it was done. I think that the right way to
think about this problem is "how can the plan cache provide a fast
path for checking validity of simple-expression plans?". And when you
think about it that way, there's a pretty obvious answer: if the plan
involves no table references, there's not going to be any locks that
have to be taken before we can check the is_valid flag. So we can
have a fast path that skips AcquirePlannerLocks and
AcquireExecutorLocks, which are a big part of the problem, and we can
also bypass some of the other random checks that GetCachedPlan has to
make, like whether RLS affects the plan.

Another chunk of the issue is the constant acquisition and release of
reference counts on the plan. We can't really skip that (I suspect
there are additional bugs in Amit's patch arising from trying to do so).
However, plpgsql already has mechanisms for paying simple-expression
setup costs once per transaction rather than once per expression use.
So we can set up a simple-expression ResourceOwner managed much like
the simple-expression EState, and have it hold a refcount on the
CachedPlan for each simple expression, and pay that overhead just once
per transaction.

So I worked on those ideas for awhile, and came up with the attached
patchset:

0001 adds some regression tests in this area (Amit's patch fails the
tests concerning search_path changes).

0002 does what's suggested above. I also did a little bit of marginal
micro-tuning in exec_eval_simple_expr() itself.

0003 improves the biggest remaining cost of validity rechecking,
which is verifying that the search_path is the same as it was when
the plan was cached.

I haven't done any serious performance testing on this, but it gives
circa 2X speedup on Pavel's original example, which is at least
fairly close to the results that Amit's patch got there. And it
makes this last batch of test cases faster not slower, too.

With this patch, perf shows the hotspots on Pavel's original example
as being

+ 19.24% 19.17% 46470 postmaster plpgsql.so [.] exec_eval_expr
+ 15.19% 15.15% 36720 postmaster plpgsql.so [.] plpgsql_param_eval_var
+ 14.98% 14.94% 36213 postmaster postgres [.] ExecInterpExpr
+ 6.32% 6.30% 15262 postmaster plpgsql.so [.] exec_stmt
+ 6.08% 6.06% 14681 postmaster plpgsql.so [.] exec_assign_value

Maybe there's more that could be done to knock fat out of
exec_eval_expr and/or plpgsql_param_eval_var, but at least
the plan cache isn't the bottleneck anymore.

I tested Tom's patches, and I can confirm these results.

It doesn't break tests (and all tests plpgsql_check tests passed without problems).

The high overhead of ExecInterpExpr is related to prepare fcinfo, and checking nulls arguments because all functions are strict
plpgsql_param_eval_var, looks like expensive is var = (PLpgSQL_var *) estate->datums[dno] and *op->resvalue = var->value;

I rechecked Tom's patch, and all tests passed, and there is stable positive performance impact about 30% in tested pi estimation example.

After this patch, the code is only 3x times slower than in Lua (originally it was 5x) and 1/3 slower than in Python (but Python calculates with higher precision).

I think so this speed is maximum what is possible (now) - after patching the slower execution is assigned to nullable types and related operations. Probably it can be reduced too. The variables can be marked as NOT NULL, and if all variables are NOT NULL, then we don't need to repeat check of null arguments of strict functions.

I'll mark this patch as ready for commiters.

Thank you

Pavel

It looks great.

Pavel

regards, tom lane

Re: plan cache overhead on plpgsql expression

From

Tom Lane

Date:

25 March 2020, 19:44:28

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I'll mark this patch as ready for commiters.

Thanks for reviewing!  Amit, do you have any thoughts on this?

            regards, tom lane

Re: plan cache overhead on plpgsql expression

From

Robert Haas

Date:

25 March 2020, 20:32:15

On Sat, Mar 21, 2020 at 2:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> With this patch, perf shows the hotspots on Pavel's original example
> as being
>
> +   19.24%    19.17%         46470  postmaster       plpgsql.so                   [.] exec_eval_expr
> +   15.19%    15.15%         36720  postmaster       plpgsql.so                   [.] plpgsql_param_eval_var
> +   14.98%    14.94%         36213  postmaster       postgres                     [.] ExecInterpExpr
> +    6.32%     6.30%         15262  postmaster       plpgsql.so                   [.] exec_stmt
> +    6.08%     6.06%         14681  postmaster       plpgsql.so                   [.] exec_assign_value

That's pretty sweet. As you say, there's probably some way to
eliminate some of the non-plancache overhead, but it's still a big
improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: plan cache overhead on plpgsql expression

From

Andres Freund

Date:

25 March 2020, 21:18:25

Hi,

On 2020-03-21 14:24:05 -0400, Tom Lane wrote:
> So while there's clearly something worth pursuing here, I do not like
> anything about the way it was done.  I think that the right way to
> think about this problem is "how can the plan cache provide a fast
> path for checking validity of simple-expression plans?".  And when you
> think about it that way, there's a pretty obvious answer: if the plan
> involves no table references, there's not going to be any locks that
> have to be taken before we can check the is_valid flag.  So we can
> have a fast path that skips AcquirePlannerLocks and
> AcquireExecutorLocks, which are a big part of the problem, and we can
> also bypass some of the other random checks that GetCachedPlan has to
> make, like whether RLS affects the plan.

That makes sense to me.

I wonder if it'd make sense to store the locks needed for
AcquirePlannerLocks/AcquireExecutorLocks in a better form. Not really
instead of your optimization, but to also address simple statements that
do reference a relation. If we stored all the locks for a plansource in
an array, it should get cheaper - and automatically implement the fast
path of skipping AcquirePlannerLocks/AcquireExecutorLocks when there's
no relations.


> Another chunk of the issue is the constant acquisition and release of
> reference counts on the plan.  We can't really skip that (I suspect
> there are additional bugs in Amit's patch arising from trying to do so).
> However, plpgsql already has mechanisms for paying simple-expression
> setup costs once per transaction rather than once per expression use.
> So we can set up a simple-expression ResourceOwner managed much like
> the simple-expression EState, and have it hold a refcount on the
> CachedPlan for each simple expression, and pay that overhead just once
> per transaction.


> I haven't done any serious performance testing on this, but it gives
> circa 2X speedup on Pavel's original example, which is at least
> fairly close to the results that Amit's patch got there.  And it
> makes this last batch of test cases faster not slower, too.
> 
> With this patch, perf shows the hotspots on Pavel's original example
> as being
> 
> +   19.24%    19.17%         46470  postmaster       plpgsql.so                   [.] exec_eval_expr
> +   15.19%    15.15%         36720  postmaster       plpgsql.so                   [.] plpgsql_param_eval_var
> +   14.98%    14.94%         36213  postmaster       postgres                     [.] ExecInterpExpr
> +    6.32%     6.30%         15262  postmaster       plpgsql.so                   [.] exec_stmt
> +    6.08%     6.06%         14681  postmaster       plpgsql.so                   [.] exec_assign_value
> 
> Maybe there's more that could be done to knock fat out of
> exec_eval_expr and/or plpgsql_param_eval_var, but at least
> the plan cache isn't the bottleneck anymore.

Nice!


> diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
> index dbae18d..8e27b03 100644
> --- a/src/backend/utils/cache/plancache.c
> +++ b/src/backend/utils/cache/plancache.c
> @@ -1278,6 +1278,160 @@ ReleaseCachedPlan(CachedPlan *plan, bool useResOwner)
>  }
>  
>  /*
> + * CachedPlanAllowsSimpleValidityCheck: can we use CachedPlanIsSimplyValid?
> + *
> + * This function, together with CachedPlanIsSimplyValid, provides a fast path
> + * for revalidating "simple" generic plans.  The core requirement to be simple
> + * is that the plan must not require taking any locks, which translates to
> + * not touching any tables; this happens to match up well with an important
> + * use-case in PL/pgSQL.  This function tests whether that's true, along
> + * with checking some other corner cases that we'd rather not bother with
> + * handling in the fast path.  (Note that it's still possible for such a plan
> + * to be invalidated, for example due to a change in a function that was
> + * inlined into the plan.)
> + *
> + * This must only be called on known-valid generic plans (eg, ones just
> + * returned by GetCachedPlan).  If it returns true, the caller may re-use
> + * the cached plan as long as CachedPlanIsSimplyValid returns true; that
> + * check is much cheaper than the full revalidation done by GetCachedPlan.
> + * Nonetheless, no required checks are omitted.
> + */
> +bool
> +CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
> +                                    CachedPlan *plan)
> +{
> +    ListCell   *lc;

Would it make sense to instead compute this as we go when building a
valid CachedPlanSource? If we make it a property of a is_valid
CachedPlanSource, we can assert that the plan is safe for use in
CachedPlanIsSimplyValid().

And perhaps also optimize the normal checks in RevalidateCachedQuery()
for cases not going through the "simple" path. We could not use the
optimizations around refcounts for those, but it still seems like it
could be useful? And less separate infrastructure is good too.



> +/*
> + * CachedPlanIsSimplyValid: quick check for plan still being valid
> + *
> + * This function must not be used unless CachedPlanAllowsSimpleValidityCheck
> + * previously said it was OK.
> + *
> + * If the plan is valid, and "owner" is not NULL, record a refcount on
> + * the plan in that resowner before returning.  It is caller's responsibility
> + * to be sure that a refcount is held on any plan that's being actively used.
> + *
> + * The code here is unconditionally safe as long as the only use of this
> + * CachedPlanSource is in connection with the particular CachedPlan pointer
> + * that's passed in.  If the plansource were being used for other purposes,
> + * it's possible that its generic plan could be invalidated and regenerated
> + * while the current caller wasn't looking, and then there could be a chance
> + * collision of address between this caller's now-stale plan pointer and the
> + * actual address of the new generic plan.  For current uses, that scenario
> + * can't happen; but with a plansource shared across multiple uses, it'd be
> + * advisable to also save plan->generation and verify that that still matches.

That's mighty subtle :/


>      /*
> +     * Likewise for the simple-expression resource owner.  (Note: it'd be
> +     * safer to create this as a child of TopTransactionResourceOwner; but
> +     * right now that causes issues in transaction-spanning procedures, so
> +     * make it standalone.)
> +     */

Hm. I'm quite unfamiliar with this area of the code - so I'm likely just
missing something: Given that you're using a post xact cleanup hook to
release the resowner, I'm not quite sure I understand this comment. The
XACT_EVENT_ABORT/COMMIT callbacks are called before
TopTransactionResourceOwner is released, no?

>  void
>  plpgsql_xact_cb(XactEvent event, void *arg)
>  {
>      /*
>       * If we are doing a clean transaction shutdown, free the EState (so that
> -     * any remaining resources will be released correctly). In an abort, we
> +     * any remaining resources will be released correctly).  In an abort, we
>       * expect the regular abort recovery procedures to release everything of
> -     * interest.
> +     * interest.  The resowner has to be explicitly released in both cases,
> +     * though, since it's not a child of TopTransactionResourceOwner.
>       */
>      if (event == XACT_EVENT_COMMIT || event == XACT_EVENT_PREPARE)
>      {
> @@ -8288,11 +8413,17 @@ plpgsql_xact_cb(XactEvent event, void *arg)
>          if (shared_simple_eval_estate)
>              FreeExecutorState(shared_simple_eval_estate);
>          shared_simple_eval_estate = NULL;
> +        if (shared_simple_eval_resowner)
> +            plpgsql_free_simple_resowner(shared_simple_eval_resowner);
> +        shared_simple_eval_resowner = NULL;
>      }
>      else if (event == XACT_EVENT_ABORT)
>      {
>          simple_econtext_stack = NULL;
>          shared_simple_eval_estate = NULL;
> +        if (shared_simple_eval_resowner)
> +            plpgsql_free_simple_resowner(shared_simple_eval_resowner);
> +        shared_simple_eval_resowner = NULL;
>      }
>  }



> +void
> +plpgsql_free_simple_resowner(ResourceOwner simple_eval_resowner)
> +{
> +    /*
> +     * At this writing, the only thing that could actually get released is
> +     * plancache refcounts; but we may as well do the full release protocol.

Hm, any chance that the multiple resowner calls here could show up in a
profile? Probably not?


> +     * We pass isCommit = false even when committing, to suppress
> +     * resource-leakage gripes, since we aren't bothering to release the
> +     * refcounts one-by-one.
> +     */

That's a bit icky...



>   * OverrideSearchPathMatchesCurrent - does path match current setting?
> + *
> + * This is tested over and over in some common code paths, and in the typical
> + * scenario where the active search path seldom changes, it'll always succeed.
> + * We make that case fast by keeping a generation counter that is advanced
> + * whenever the active search path changes.
>   */

Could it be worth optimizing the path generation logic so that a
push/pop of an override path restores the old generation? That way we
could likely avoid the overhead even for cases where some functions
specify their own search path?


Greetings,

Andres Freund

Re: plan cache overhead on plpgsql expression

From

Tom Lane

Date:

25 March 2020, 21:51:50

Andres Freund <andres@anarazel.de> writes:
> I wonder if it'd make sense to store the locks needed for
> AcquirePlannerLocks/AcquireExecutorLocks in a better form.

Perhaps, but I'm not sure that either of those functions represent
material overhead in cases besides this one.

> Would it make sense to instead compute this as we go when building a
> valid CachedPlanSource? If we make it a property of a is_valid
> CachedPlanSource, we can assert that the plan is safe for use in
> CachedPlanIsSimplyValid().

I'm inclined to think not, because it'd just be overhead for other
users of cached plans.

> That's mighty subtle :/

Yeah :-(.  I don't like it that much, but I don't see an easy way to
do better, given the way that plpgsql manages its simple expressions.

>> /*
>> +     * Likewise for the simple-expression resource owner.  (Note: it'd be
>> +     * safer to create this as a child of TopTransactionResourceOwner; but
>> +     * right now that causes issues in transaction-spanning procedures, so
>> +     * make it standalone.)
>> +     */

> Hm. I'm quite unfamiliar with this area of the code - so I'm likely just
> missing something: Given that you're using a post xact cleanup hook to
> release the resowner, I'm not quite sure I understand this comment. The
> XACT_EVENT_ABORT/COMMIT callbacks are called before
> TopTransactionResourceOwner is released, no?

The comment is there because the regression tests fall over if you try
to do it the other way :-(.  The failure I saw was specific to a
transaction being done in a DO block, and maybe we could handle that
differently from the case for a normal procedure; but it seemed better
to me to make them the same.

There's a separate question lurking under there, which is whether the
existing management of the simple-expression EState is right at all
for transaction-spanning DO blocks; frankly it smells a bit fishy to
me.  But looking into that did not seem in-scope for this patch.

>> +void
>> +plpgsql_free_simple_resowner(ResourceOwner simple_eval_resowner)
>> +{
>> +    /*
>> +     * At this writing, the only thing that could actually get released is
>> +     * plancache refcounts; but we may as well do the full release protocol.

> Hm, any chance that the multiple resowner calls here could show up in a
> profile? Probably not?

Doubt it.  On the other hand, as the code stands it's certain that the
resowner contains nothing but plancache pins (while I was writing the
patch it wasn't entirely clear that that would hold).  So we could
drop the two unnecessary calls.  There are assertions in
ResourceOwnerDelete that would fire if we somehow missed releasing
anything, so it doesn't seem like much of a maintenance hazard.

>> +     * We pass isCommit = false even when committing, to suppress
>> +     * resource-leakage gripes, since we aren't bothering to release the
>> +     * refcounts one-by-one.
>> +     */

> That's a bit icky...

Agreed, and it's not like our practice elsewhere.  I thought about adding
a data structure that would track the set of held plancache pins outside
the resowner, but concluded that that'd just be pointless duplicative
overhead.

> Could it be worth optimizing the path generation logic so that a
> push/pop of an override path restores the old generation?

(1) Not given the existing set of uses of the push/pop capability, which
so far as I can see is *only* CREATE SCHEMA.  It's not involved in any
other manipulations of the search path.  And (2) as this is written, it's
totally unsafe for the generation counter ever to back up; that risks
false match detections later.

I appreciate the review!

            regards, tom lane

Re: plan cache overhead on plpgsql expression

From

Andres Freund

Date:

25 March 2020, 22:49:03

Hi,

On 2020-03-25 17:51:50 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I wonder if it'd make sense to store the locks needed for
> > AcquirePlannerLocks/AcquireExecutorLocks in a better form.
> 
> Perhaps, but I'm not sure that either of those functions represent
> material overhead in cases besides this one.

For pgbench -M prepared -S GetCachedPlan() and its children are 2.36% of
the time. 1.75% of the total is RevalidateCachedQuery(). 1.13% of that
in turn is LockAcquireExtended.

That's not huge, but also not nothing. And this includes client
roundtrips. So I assume it'd show up larger when executing actual
queries in a function, or when pipelining (which e.g. pgjdbc has on by
default).

If I to simple lookups from pgbench_accounts in a loop in plpgsql
GetCachedPlan() is 4.43% and the LockAcquireExtended()'s called from
within are 1.46%.

So it's plausible that making this a more generic optimization would be
worthwhile.

> > Would it make sense to instead compute this as we go when building a
> > valid CachedPlanSource? If we make it a property of a is_valid
> > CachedPlanSource, we can assert that the plan is safe for use in
> > CachedPlanIsSimplyValid().
> 
> I'm inclined to think not, because it'd just be overhead for other
> users of cached plans.

Even if we make RevalidateCachedQuery take advantage of the simpler
tests when possible?  While there's plenty of cases where it'd not be
applicable, it seems likely that those wouldn't notice the small
slowdown either.

> >> /*
> >> +     * Likewise for the simple-expression resource owner.  (Note: it'd be
> >> +     * safer to create this as a child of TopTransactionResourceOwner; but
> >> +     * right now that causes issues in transaction-spanning procedures, so
> >> +     * make it standalone.)
> >> +     */
> 
> > Hm. I'm quite unfamiliar with this area of the code - so I'm likely just
> > missing something: Given that you're using a post xact cleanup hook to
> > release the resowner, I'm not quite sure I understand this comment. The
> > XACT_EVENT_ABORT/COMMIT callbacks are called before
> > TopTransactionResourceOwner is released, no?
> 
> The comment is there because the regression tests fall over if you try
> to do it the other way :-(.  The failure I saw was specific to a
> transaction being done in a DO block, and maybe we could handle that
> differently from the case for a normal procedure; but it seemed better
> to me to make them the same.

I'm still confused as to why it actually fixes the issue. Feel we should
at least understand what's going on before commtting.

> >> +void
> >> +plpgsql_free_simple_resowner(ResourceOwner simple_eval_resowner)
> >> +{
> >> +    /*
> >> +     * At this writing, the only thing that could actually get released is
> >> +     * plancache refcounts; but we may as well do the full release protocol.
> 
> > Hm, any chance that the multiple resowner calls here could show up in a
> > profile? Probably not?
> 
> Doubt it.  On the other hand, as the code stands it's certain that the
> resowner contains nothing but plancache pins (while I was writing the
> patch it wasn't entirely clear that that would hold).  So we could
> drop the two unnecessary calls.  There are assertions in
> ResourceOwnerDelete that would fire if we somehow missed releasing
> anything, so it doesn't seem like much of a maintenance hazard.

One could even argue that that's a nice crosscheck: Due to the later
release it'd not actually be correct to just add "arbitrary" things to
that resowner.

> > Could it be worth optimizing the path generation logic so that a
> > push/pop of an override path restores the old generation?
> 
> (1) Not given the existing set of uses of the push/pop capability, which
> so far as I can see is *only* CREATE SCHEMA.

Oh. Well, then that'd be something for later.

I do recall that there were issues with SET search_path in functions
causing noticable slowdowns...

> It's not involved in any other manipulations of the search path.  And
> (2) as this is written, it's totally unsafe for the generation counter
> ever to back up; that risks false match detections later.

I was just thinking of backing up the 'active generation' state. New
generations would have to come from a separate 'next generation'
counter.

Greetings,

Andres Freund

Re: plan cache overhead on plpgsql expression

From

Tom Lane

Date:

25 March 2020, 23:15:28

Andres Freund <andres@anarazel.de> writes:
> On 2020-03-25 17:51:50 -0400, Tom Lane wrote:
>> Perhaps, but I'm not sure that either of those functions represent
>> material overhead in cases besides this one.

> That's not huge, but also not nothing.

I see.  So maybe worth the trouble --- but still, seems like material for
a separate patch.

>>> Would it make sense to instead compute this as we go when building a
>>> valid CachedPlanSource?

>> I'm inclined to think not, because it'd just be overhead for other
>> users of cached plans.

> Even if we make RevalidateCachedQuery take advantage of the simpler
> tests when possible?

I'm not convinced that any real optimization is practical once you
allow tables in the query.  You then have to check the RLS-active
flags in some form, and the existing tests are not *that* expensive
as long as the answer is "no".  At best I think you might be reducing
two or three simple tests to one.

Also, the reason why this is interesting at all for plpgsql simple
expressions is that the cost of these checks, simple as they are,
is a noticeable fraction of the total time to do a simple expression.
That's not going to be the case for queries involving table access.

>> The comment is there because the regression tests fall over if you try
>> to do it the other way :-(.  The failure I saw was specific to a
>> transaction being done in a DO block, and maybe we could handle that
>> differently from the case for a normal procedure; but it seemed better
>> to me to make them the same.

> I'm still confused as to why it actually fixes the issue. Feel we should
> at least understand what's going on before commtting.

I do understand the issue.  If you make the simple-resowner a child
of TopTransactionResourceOwner, it vanishes at commit --- but
plpgsql_inline_handler has still got a pointer to it, which it'll try
to free afterwards, if the commit was inside the DO block.

What's not entirely clear to me is why this in exec_stmt_commit

@@ -4825,6 +4845,7 @@ exec_stmt_commit(PLpgSQL_execstate *estate, PLpgSQL_stmt_commit *stmt)
     }

     estate->simple_eval_estate = NULL;
+    estate->simple_eval_resowner = NULL;
     plpgsql_create_econtext(estate);

     return PLPGSQL_RC_OK;

is okay --- it avoids having a dangling pointer, sure, but if we're inside
a DO block won't plpgsql_create_econtext create a simple_eval_estate (and,
now, simple_eval_resowner) with the wrong properties?  But that's a
pre-existing question, and maybe Peter got it right and there's no
problem.

>> Doubt it.  On the other hand, as the code stands it's certain that the
>> resowner contains nothing but plancache pins (while I was writing the
>> patch it wasn't entirely clear that that would hold).  So we could
>> drop the two unnecessary calls.  There are assertions in
>> ResourceOwnerDelete that would fire if we somehow missed releasing
>> anything, so it doesn't seem like much of a maintenance hazard.

> One could even argue that that's a nice crosscheck: Due to the later
> release it'd not actually be correct to just add "arbitrary" things to
> that resowner.

OK, I'll change that.

>> (1) Not given the existing set of uses of the push/pop capability, which
>> so far as I can see is *only* CREATE SCHEMA.

> I do recall that there were issues with SET search_path in functions
> causing noticable slowdowns...

Yeah, possibly that could be improved, but that seems outside the scope of
this patch.

>> (2) as this is written, it's totally unsafe for the generation counter
>> ever to back up; that risks false match detections later.

> I was just thinking of backing up the 'active generation' state. New
> generations would have to come from a separate 'next generation'
> counter.

Oh, I see.  Yeah, that could work, but there's no point until we have
push/pop calls that are actually interesting for performance.

            regards, tom lane

Re: plan cache overhead on plpgsql expression

From

Andres Freund

Date:

25 March 2020, 23:41:43

Hi,

On 2020-03-25 19:15:28 -0400, Tom Lane wrote:
> >> The comment is there because the regression tests fall over if you try
> >> to do it the other way :-(.  The failure I saw was specific to a
> >> transaction being done in a DO block, and maybe we could handle that
> >> differently from the case for a normal procedure; but it seemed better
> >> to me to make them the same.
> 
> > I'm still confused as to why it actually fixes the issue. Feel we should
> > at least understand what's going on before commtting.
> 
> I do understand the issue.  If you make the simple-resowner a child
> of TopTransactionResourceOwner, it vanishes at commit --- but
> plpgsql_inline_handler has still got a pointer to it, which it'll try
> to free afterwards, if the commit was inside the DO block.

I was confused why it fixes that, because:

>  void
>  plpgsql_xact_cb(XactEvent event, void *arg)
>  {
>      /*
>       * If we are doing a clean transaction shutdown, free the EState (so that
> -     * any remaining resources will be released correctly). In an abort, we
> +     * any remaining resources will be released correctly).  In an abort, we
>       * expect the regular abort recovery procedures to release everything of
> -     * interest.
> +     * interest.  The resowner has to be explicitly released in both cases,
> +     * though, since it's not a child of TopTransactionResourceOwner.
>       */
>      if (event == XACT_EVENT_COMMIT || event == XACT_EVENT_PREPARE)
>      {
> @@ -8288,11 +8413,17 @@ plpgsql_xact_cb(XactEvent event, void *arg)
>          if (shared_simple_eval_estate)
>              FreeExecutorState(shared_simple_eval_estate);
>          shared_simple_eval_estate = NULL;
> +        if (shared_simple_eval_resowner)
> +            plpgsql_free_simple_resowner(shared_simple_eval_resowner);
> +        shared_simple_eval_resowner = NULL;
>      }
>      else if (event == XACT_EVENT_ABORT)
>      {
>          simple_econtext_stack = NULL;
>          shared_simple_eval_estate = NULL;
> +        if (shared_simple_eval_resowner)
> +            plpgsql_free_simple_resowner(shared_simple_eval_resowner);
> +        shared_simple_eval_resowner = NULL;
>      }
>  }

will lead to shared_simple_eval_resowner being deleted before
TopTransactionResourceOwner is deleted:

static void
CommitTransaction(void)
...
    CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_COMMIT
                      : XACT_EVENT_COMMIT);

    ResourceOwnerRelease(TopTransactionResourceOwner,
                         RESOURCE_RELEASE_BEFORE_LOCKS,
                         true, true);

What I missed is that the inline handler will not use
shared_simple_eval_resowner, but instead use the function local
simple_eval_resowner. Which I had not realized before.


I'm still confused by the comment I was reacting to - the code
explicitly is about creating the *shared* resowner:

> +     * Likewise for the simple-expression resource owner.  (Note: it'd be
> +     * safer to create this as a child of TopTransactionResourceOwner; but
> +     * right now that causes issues in transaction-spanning procedures, so
> +     * make it standalone.)
> +     */
> +    if (estate->simple_eval_resowner == NULL)
> +    {
> +        if (shared_simple_eval_resowner == NULL)
> +            shared_simple_eval_resowner =
> +                ResourceOwnerCreate(NULL, "PL/pgSQL simple expressions");
> +        estate->simple_eval_resowner = shared_simple_eval_resowner;
> +    }

which, afaict, will always deleted before TopTransactionResourceOwner
goes away?


Greetings,

Andres Freund

Re: plan cache overhead on plpgsql expression

From

Tom Lane

Date:

25 March 2020, 23:50:08

Andres Freund <andres@anarazel.de> writes:
> I'm still confused by the comment I was reacting to - the code
> explicitly is about creating the *shared* resowner:

Right, this is because of the choice I mentioned earlier about creating
this resowner the same way as the one for the inline case.  I guess the
comments could go into more detail.  Or we could make the parentage
different for the two cases, but I don't like that much.

            regards, tom lane

Re: plan cache overhead on plpgsql expression

From

Amit Langote

Date:

26 March 2020, 10:56:32

On Thu, Mar 26, 2020 at 4:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Pavel Stehule <pavel.stehule@gmail.com> writes:
> > I'll mark this patch as ready for commiters.
>
> Thanks for reviewing!  Amit, do you have any thoughts on this?

Thanks for picking this up.  Test cases added by your patch really
shows why the plancache and the planner must not be skipped, something
I totally failed to grasp.

I can't really see any problem with your patch, but mainly due to my
unfamiliarity with some of the more complicated things it touches,
like resowner stuff.

One thing -- I don't get the division between
CachedPlanAllowsSimpleValidityCheck() and CachedPlanIsSimplyValid().
Maybe I am missing something, but could there not be just one
function, possibly using whether expr_simple_expr is set or not to
skip or do, resp., the checks that the former does?

--
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: plan cache overhead on plpgsql expression

From

Tom Lane

Date:

26 March 2020, 14:02:20

Andres Freund <andres@anarazel.de> writes:
> On 2020-03-25 17:51:50 -0400, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> Hm, any chance that the multiple resowner calls here could show up in a
>>> profile? Probably not?

>> Doubt it.  On the other hand, as the code stands it's certain that the
>> resowner contains nothing but plancache pins (while I was writing the
>> patch it wasn't entirely clear that that would hold).  So we could
>> drop the two unnecessary calls.  There are assertions in
>> ResourceOwnerDelete that would fire if we somehow missed releasing
>> anything, so it doesn't seem like much of a maintenance hazard.

> One could even argue that that's a nice crosscheck: Due to the later
> release it'd not actually be correct to just add "arbitrary" things to
> that resowner.

I had a thought about a possibly-cleaner way to do this.  We could invent
a resowner function, say ResourceOwnerReleaseAllPlanCacheRefs, that
explicitly releases all plancache pins it knows about.  So plpgsql
would not call the regular ResourceOwnerRelease entry point at all,
but just call that and then ResourceOwnerDelete, again relying on the
assertions therein to catch anything not released.

This would be slightly more code but it'd perhaps make it clearer
what's going on, without the cost of a duplicative data structure.
Perhaps in future there'd be use for similar calls for other resource
types.

            regards, tom lane

Re: plan cache overhead on plpgsql expression

From

Tom Lane

Date:

26 March 2020, 17:39:52

Amit Langote <amitlangote09@gmail.com> writes:
> One thing -- I don't get the division between
> CachedPlanAllowsSimpleValidityCheck() and CachedPlanIsSimplyValid().
> Maybe I am missing something, but could there not be just one
> function, possibly using whether expr_simple_expr is set or not to
> skip or do, resp., the checks that the former does?

Well, we don't want to do the initial checks over again every time;
we want the is-valid test to be as simple and fast as we can make it.
I suppose we could have one function with a boolean flag saying "this is a
recheck", but I don't find that idea to be any better than the way it is.

Also, although the existing structure in plpgsql always calls
CachedPlanIsSimplyValid immediately after a successful call to
CachedPlanAllowsSimpleValidityCheck, I don't think that's necessarily
going to be true for other potential users of the functions.
So merging the functions would reduce flexibility.

            regards, tom lane

Re: plan cache overhead on plpgsql expression

From

Tom Lane

Date:

26 March 2020, 18:37:59

I wrote:
> I had a thought about a possibly-cleaner way to do this.  We could invent
> a resowner function, say ResourceOwnerReleaseAllPlanCacheRefs, that
> explicitly releases all plancache pins it knows about.  So plpgsql
> would not call the regular ResourceOwnerRelease entry point at all,
> but just call that and then ResourceOwnerDelete, again relying on the
> assertions therein to catch anything not released.

Here's a version that does it like that.  This does seem marginally
nicer than the other way.  I have a feeling that at some point we'll
want to expose resowners' contents more generally, but I'm not quite
sure what the use-cases will be, so I don't want to design that now.

Also, I studied the question of DO blocks' eval_estate + resowner
more carefully, and eventually concluded that the way it's being
done is okay --- it doesn't leak memory, as I'd first suspected.
But it's surely underdocumented, so I added some comments about it.
I also concluded as part of that study that it's probably best if
we *do* make the resowner parentage different in the two cases
after all.  So this has the "shared" resowner as a child of
TopTransactionResourceOwner after all (which means we don't need
to delete it explicitly), while a DO block's private resowner is
standalone (so it needs an explicit deletion).

Testing that reminded me of the other regression test failure I'd seen
when I first tried to do it: select_parallel.sql shows a WARNING about
a plancache leak in a parallel worker process.  When I looked into the
reason for that, it turned out that some cowboy has split
XACT_EVENT_COMMIT into XACT_EVENT_COMMIT and
XACT_EVENT_PARALLEL_COMMIT (where the latter is used in parallel
workers) without bothering to fix the collateral damage to plpgsql.
So plpgsql_xact_cb isn't doing any cleanup in parallel workers, and
hasn't been for a couple of releases.  The bad effects of that are
probably limited given that the worker process will exit after
committing, but I still think that that part of this patch is a bug
fix that needs to be back-patched.  (Just looking at what
FreeExecutorState does, I wonder whether jit_release_context has any
side-effects that are visible outside the process?  But I bet I can
make a test case that shows issues even without JIT, based on the
failure to call ExprContext shutdown callbacks.)

Anyway, I think this is committable now.

            regards, tom lane

diff --git a/src/pl/plpgsql/src/Makefile b/src/pl/plpgsql/src/Makefile
index 70a9c34..193df8a 100644
--- a/src/pl/plpgsql/src/Makefile
+++ b/src/pl/plpgsql/src/Makefile
@@ -33,8 +33,8 @@ DATA = plpgsql.control plpgsql--1.0.sql
 REGRESS_OPTS = --dbname=$(PL_TESTDB)

 REGRESS = plpgsql_call plpgsql_control plpgsql_copy plpgsql_domain \
-    plpgsql_record plpgsql_cache plpgsql_transaction plpgsql_trap \
-    plpgsql_trigger plpgsql_varprops
+    plpgsql_record plpgsql_cache plpgsql_simple plpgsql_transaction \
+    plpgsql_trap plpgsql_trigger plpgsql_varprops

 # where to find gen_keywordlist.pl and subsidiary files
 TOOLSDIR = $(top_srcdir)/src/tools
diff --git a/src/pl/plpgsql/src/expected/plpgsql_simple.out b/src/pl/plpgsql/src/expected/plpgsql_simple.out
new file mode 100644
index 0000000..5a2fefa
--- /dev/null
+++ b/src/pl/plpgsql/src/expected/plpgsql_simple.out
@@ -0,0 +1,68 @@
+--
+-- Tests for plpgsql's handling of "simple" expressions
+--
+-- Check that changes to an inline-able function are handled correctly
+create function simplesql(int) returns int language sql
+as 'select $1';
+create function simplecaller() returns int language plpgsql
+as $$
+declare
+  sum int := 0;
+begin
+  for n in 1..10 loop
+    sum := sum + simplesql(n);
+    if n = 5 then
+      create or replace function simplesql(int) returns int language sql
+      as 'select $1 + 100';
+    end if;
+  end loop;
+  return sum;
+end$$;
+select simplecaller();
+ simplecaller
+--------------
+          555
+(1 row)
+
+-- Check that changes in search path are dealt with correctly
+create schema simple1;
+create function simple1.simpletarget(int) returns int language plpgsql
+as $$begin return $1; end$$;
+create function simpletarget(int) returns int language plpgsql
+as $$begin return $1 + 100; end$$;
+create or replace function simplecaller() returns int language plpgsql
+as $$
+declare
+  sum int := 0;
+begin
+  for n in 1..10 loop
+    sum := sum + simpletarget(n);
+    if n = 5 then
+      set local search_path = 'simple1';
+    end if;
+  end loop;
+  return sum;
+end$$;
+select simplecaller();
+ simplecaller
+--------------
+          555
+(1 row)
+
+-- try it with non-volatile functions, too
+alter function simple1.simpletarget(int) immutable;
+alter function simpletarget(int) immutable;
+select simplecaller();
+ simplecaller
+--------------
+          555
+(1 row)
+
+-- make sure flushing local caches changes nothing
+\c -
+select simplecaller();
+ simplecaller
+--------------
+          555
+(1 row)
+
diff --git a/src/pl/plpgsql/src/sql/plpgsql_simple.sql b/src/pl/plpgsql/src/sql/plpgsql_simple.sql
new file mode 100644
index 0000000..8a95768
--- /dev/null
+++ b/src/pl/plpgsql/src/sql/plpgsql_simple.sql
@@ -0,0 +1,61 @@
+--
+-- Tests for plpgsql's handling of "simple" expressions
+--
+
+-- Check that changes to an inline-able function are handled correctly
+create function simplesql(int) returns int language sql
+as 'select $1';
+
+create function simplecaller() returns int language plpgsql
+as $$
+declare
+  sum int := 0;
+begin
+  for n in 1..10 loop
+    sum := sum + simplesql(n);
+    if n = 5 then
+      create or replace function simplesql(int) returns int language sql
+      as 'select $1 + 100';
+    end if;
+  end loop;
+  return sum;
+end$$;
+
+select simplecaller();
+
+
+-- Check that changes in search path are dealt with correctly
+create schema simple1;
+
+create function simple1.simpletarget(int) returns int language plpgsql
+as $$begin return $1; end$$;
+
+create function simpletarget(int) returns int language plpgsql
+as $$begin return $1 + 100; end$$;
+
+create or replace function simplecaller() returns int language plpgsql
+as $$
+declare
+  sum int := 0;
+begin
+  for n in 1..10 loop
+    sum := sum + simpletarget(n);
+    if n = 5 then
+      set local search_path = 'simple1';
+    end if;
+  end loop;
+  return sum;
+end$$;
+
+select simplecaller();
+
+-- try it with non-volatile functions, too
+alter function simple1.simpletarget(int) immutable;
+alter function simpletarget(int) immutable;
+
+select simplecaller();
+
+-- make sure flushing local caches changes nothing
+\c -
+
+select simplecaller();
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index dbae18d..8e27b03 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1278,6 +1278,160 @@ ReleaseCachedPlan(CachedPlan *plan, bool useResOwner)
 }

 /*
+ * CachedPlanAllowsSimpleValidityCheck: can we use CachedPlanIsSimplyValid?
+ *
+ * This function, together with CachedPlanIsSimplyValid, provides a fast path
+ * for revalidating "simple" generic plans.  The core requirement to be simple
+ * is that the plan must not require taking any locks, which translates to
+ * not touching any tables; this happens to match up well with an important
+ * use-case in PL/pgSQL.  This function tests whether that's true, along
+ * with checking some other corner cases that we'd rather not bother with
+ * handling in the fast path.  (Note that it's still possible for such a plan
+ * to be invalidated, for example due to a change in a function that was
+ * inlined into the plan.)
+ *
+ * This must only be called on known-valid generic plans (eg, ones just
+ * returned by GetCachedPlan).  If it returns true, the caller may re-use
+ * the cached plan as long as CachedPlanIsSimplyValid returns true; that
+ * check is much cheaper than the full revalidation done by GetCachedPlan.
+ * Nonetheless, no required checks are omitted.
+ */
+bool
+CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
+                                    CachedPlan *plan)
+{
+    ListCell   *lc;
+
+    /* Sanity-check that the caller gave us a validated generic plan. */
+    Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
+    Assert(plan->magic == CACHEDPLAN_MAGIC);
+    Assert(plansource->is_valid);
+    Assert(plan->is_valid);
+    Assert(plan == plansource->gplan);
+
+    /* We don't support oneshot plans here. */
+    if (plansource->is_oneshot)
+        return false;
+    Assert(!plan->is_oneshot);
+
+    /*
+     * If the plan is dependent on RLS considerations, or it's transient,
+     * reject.  These things probably can't ever happen for table-free
+     * queries, but for safety's sake let's check.
+     */
+    if (plansource->dependsOnRLS)
+        return false;
+    if (plan->dependsOnRole)
+        return false;
+    if (TransactionIdIsValid(plan->saved_xmin))
+        return false;
+
+    /*
+     * Reject if AcquirePlannerLocks would have anything to do.  This is
+     * simplistic, but there's no need to inquire any more carefully; indeed,
+     * for current callers it shouldn't even be possible to hit any of these
+     * checks.
+     */
+    foreach(lc, plansource->query_list)
+    {
+        Query       *query = lfirst_node(Query, lc);
+
+        if (query->commandType == CMD_UTILITY)
+            return false;
+        if (query->rtable || query->cteList || query->hasSubLinks)
+            return false;
+    }
+
+    /*
+     * Reject if AcquireExecutorLocks would have anything to do.  This is
+     * probably unnecessary given the previous check, but let's be safe.
+     */
+    foreach(lc, plan->stmt_list)
+    {
+        PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
+        ListCell   *lc2;
+
+        if (plannedstmt->commandType == CMD_UTILITY)
+            return false;
+
+        /*
+         * We have to grovel through the rtable because it's likely to contain
+         * an RTE_RESULT relation, rather than being totally empty.
+         */
+        foreach(lc2, plannedstmt->rtable)
+        {
+            RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+
+            if (rte->rtekind == RTE_RELATION)
+                return false;
+        }
+    }
+
+    /*
+     * Okay, it's simple.  Note that what we've primarily established here is
+     * that no locks need be taken before checking the plan's is_valid flag.
+     */
+    return true;
+}
+
+/*
+ * CachedPlanIsSimplyValid: quick check for plan still being valid
+ *
+ * This function must not be used unless CachedPlanAllowsSimpleValidityCheck
+ * previously said it was OK.
+ *
+ * If the plan is valid, and "owner" is not NULL, record a refcount on
+ * the plan in that resowner before returning.  It is caller's responsibility
+ * to be sure that a refcount is held on any plan that's being actively used.
+ *
+ * The code here is unconditionally safe as long as the only use of this
+ * CachedPlanSource is in connection with the particular CachedPlan pointer
+ * that's passed in.  If the plansource were being used for other purposes,
+ * it's possible that its generic plan could be invalidated and regenerated
+ * while the current caller wasn't looking, and then there could be a chance
+ * collision of address between this caller's now-stale plan pointer and the
+ * actual address of the new generic plan.  For current uses, that scenario
+ * can't happen; but with a plansource shared across multiple uses, it'd be
+ * advisable to also save plan->generation and verify that that still matches.
+ */
+bool
+CachedPlanIsSimplyValid(CachedPlanSource *plansource, CachedPlan *plan,
+                        ResourceOwner owner)
+{
+    /*
+     * Careful here: since the caller doesn't necessarily hold a refcount on
+     * the plan to start with, it's possible that "plan" is a dangling
+     * pointer.  Don't dereference it until we've verified that it still
+     * matches the plansource's gplan (which is either valid or NULL).
+     */
+    Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
+
+    /*
+     * Has cache invalidation fired on this plan?  We can check this right
+     * away since there are no locks that we'd need to acquire first.
+     */
+    if (!plansource->is_valid || plan != plansource->gplan || !plan->is_valid)
+        return false;
+
+    Assert(plan->magic == CACHEDPLAN_MAGIC);
+
+    /* Is the search_path still the same as when we made it? */
+    Assert(plansource->search_path != NULL);
+    if (!OverrideSearchPathMatchesCurrent(plansource->search_path))
+        return false;
+
+    /* It's still good.  Bump refcount if requested. */
+    if (owner)
+    {
+        ResourceOwnerEnlargePlanCacheRefs(owner);
+        plan->refcount++;
+        ResourceOwnerRememberPlanCacheRef(owner, plan);
+    }
+
+    return true;
+}
+
+/*
  * CachedPlanSetParentContext: move a CachedPlanSource to a new memory context
  *
  * This can only be applied to unsaved plans; once saved, a plan always
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index 3c39e48..8bc2c4e 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -679,6 +679,30 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
 }

 /*
+ * ResourceOwnerReleaseAllPlanCacheRefs
+ *        Release the plancache references (only) held by this owner.
+ *
+ * We might eventually add similar functions for other resource types,
+ * but for now, only this is needed.
+ */
+void
+ResourceOwnerReleaseAllPlanCacheRefs(ResourceOwner owner)
+{
+    ResourceOwner save;
+    Datum        foundres;
+
+    save = CurrentResourceOwner;
+    CurrentResourceOwner = owner;
+    while (ResourceArrayGetAny(&(owner->planrefarr), &foundres))
+    {
+        CachedPlan *res = (CachedPlan *) DatumGetPointer(foundres);
+
+        ReleaseCachedPlan(res, true);
+    }
+    CurrentResourceOwner = save;
+}
+
+/*
  * ResourceOwnerDelete
  *        Delete an owner object and its descendants.
  *
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 6a5953c..037edd0 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -20,6 +20,8 @@
 #include "nodes/params.h"
 #include "tcop/cmdtag.h"
 #include "utils/queryenvironment.h"
+#include "utils/resowner.h"
+

 /* Forward declaration, to avoid including parsenodes.h here */
 struct RawStmt;
@@ -220,6 +222,12 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
                                  QueryEnvironment *queryEnv);
 extern void ReleaseCachedPlan(CachedPlan *plan, bool useResOwner);

+extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
+                                                CachedPlan *plan);
+extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
+                                    CachedPlan *plan,
+                                    ResourceOwner owner);
+
 extern CachedExpression *GetCachedExpression(Node *expr);
 extern void FreeCachedExpression(CachedExpression *cexpr);

diff --git a/src/include/utils/resowner.h b/src/include/utils/resowner.h
index 850bf69..878f39c 100644
--- a/src/include/utils/resowner.h
+++ b/src/include/utils/resowner.h
@@ -71,6 +71,7 @@ extern void ResourceOwnerRelease(ResourceOwner owner,
                                  ResourceReleasePhase phase,
                                  bool isCommit,
                                  bool isTopLevel);
+extern void ResourceOwnerReleaseAllPlanCacheRefs(ResourceOwner owner);
 extern void ResourceOwnerDelete(ResourceOwner owner);
 extern ResourceOwner ResourceOwnerGetParent(ResourceOwner owner);
 extern void ResourceOwnerNewParent(ResourceOwner owner,
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d3ad4fa..6f8ae2c 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -84,6 +84,13 @@ typedef struct
  * has its own simple-expression EState, which is cleaned up at exit from
  * plpgsql_inline_handler().  DO blocks still use the simple_econtext_stack,
  * though, so that subxact abort cleanup does the right thing.
+ *
+ * (However, if a DO block executes COMMIT or ROLLBACK, then exec_stmt_commit
+ * or exec_stmt_rollback will unlink it from the DO's simple-expression EState
+ * and create a new shared EState that will be used thenceforth.  The original
+ * EState will be cleaned up when we get back to plpgsql_inline_handler.  This
+ * is a bit ugly, but it isn't worth doing better, since scenarios like this
+ * can't result in indefinite accumulation of state trees.)
  */
 typedef struct SimpleEcontextStackEntry
 {
@@ -96,6 +103,15 @@ static EState *shared_simple_eval_estate = NULL;
 static SimpleEcontextStackEntry *simple_econtext_stack = NULL;

 /*
+ * In addition to the shared simple-eval EState, we have a shared resource
+ * owner that holds refcounts on the CachedPlans for any "simple" expressions
+ * we have evaluated in the current transaction.  This allows us to avoid
+ * continually grabbing and releasing a plan refcount when a simple expression
+ * is used over and over.
+ */
+static ResourceOwner shared_simple_eval_resowner = NULL;
+
+/*
  * Memory management within a plpgsql function generally works with three
  * contexts:
  *
@@ -314,7 +330,8 @@ static int    exec_stmt_set(PLpgSQL_execstate *estate,
 static void plpgsql_estate_setup(PLpgSQL_execstate *estate,
                                  PLpgSQL_function *func,
                                  ReturnSetInfo *rsi,
-                                 EState *simple_eval_estate);
+                                 EState *simple_eval_estate,
+                                 ResourceOwner simple_eval_resowner);
 static void exec_eval_cleanup(PLpgSQL_execstate *estate);

 static void exec_prepare_plan(PLpgSQL_execstate *estate,
@@ -440,16 +457,19 @@ static char *format_preparedparamsdata(PLpgSQL_execstate *estate,
  *
  * This is also used to execute inline code blocks (DO blocks).  The only
  * difference that this code is aware of is that for a DO block, we want
- * to use a private simple_eval_estate, which is created and passed in by
- * the caller.  For regular functions, pass NULL, which implies using
- * shared_simple_eval_estate.  (When using a private simple_eval_estate,
+ * to use a private simple_eval_estate and a private simple_eval_resowner,
+ * which are created and passed in by the caller.  For regular functions,
+ * pass NULL, which implies using shared_simple_eval_estate and
+ * shared_simple_eval_resowner.  (When using a private simple_eval_estate,
  * we must also use a private cast hashtable, but that's taken care of
  * within plpgsql_estate_setup.)
  * ----------
  */
 Datum
 plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo,
-                      EState *simple_eval_estate, bool atomic)
+                      EState *simple_eval_estate,
+                      ResourceOwner simple_eval_resowner,
+                      bool atomic)
 {
     PLpgSQL_execstate estate;
     ErrorContextCallback plerrcontext;
@@ -460,7 +480,7 @@ plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo,
      * Setup the execution state
      */
     plpgsql_estate_setup(&estate, func, (ReturnSetInfo *) fcinfo->resultinfo,
-                         simple_eval_estate);
+                         simple_eval_estate, simple_eval_resowner);
     estate.atomic = atomic;

     /*
@@ -897,7 +917,7 @@ plpgsql_exec_trigger(PLpgSQL_function *func,
     /*
      * Setup the execution state
      */
-    plpgsql_estate_setup(&estate, func, NULL, NULL);
+    plpgsql_estate_setup(&estate, func, NULL, NULL, NULL);
     estate.trigdata = trigdata;

     /*
@@ -1135,7 +1155,7 @@ plpgsql_exec_event_trigger(PLpgSQL_function *func, EventTriggerData *trigdata)
     /*
      * Setup the execution state
      */
-    plpgsql_estate_setup(&estate, func, NULL, NULL);
+    plpgsql_estate_setup(&estate, func, NULL, NULL, NULL);
     estate.evtrigdata = trigdata;

     /*
@@ -2319,6 +2339,7 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
          * some internal state.
          */
         estate->simple_eval_estate = NULL;
+        estate->simple_eval_resowner = NULL;
         plpgsql_create_econtext(estate);
     }

@@ -3874,7 +3895,8 @@ static void
 plpgsql_estate_setup(PLpgSQL_execstate *estate,
                      PLpgSQL_function *func,
                      ReturnSetInfo *rsi,
-                     EState *simple_eval_estate)
+                     EState *simple_eval_estate,
+                     ResourceOwner simple_eval_resowner)
 {
     HASHCTL        ctl;

@@ -3965,6 +3987,11 @@ plpgsql_estate_setup(PLpgSQL_execstate *estate,
         estate->cast_hash = shared_cast_hash;
         estate->cast_hash_context = shared_cast_context;
     }
+    /* likewise for the simple-expression resource owner */
+    if (simple_eval_resowner)
+        estate->simple_eval_resowner = simple_eval_resowner;
+    else
+        estate->simple_eval_resowner = shared_simple_eval_resowner;

     /*
      * We start with no stmt_mcontext; one will be created only if needed.
@@ -4825,6 +4852,7 @@ exec_stmt_commit(PLpgSQL_execstate *estate, PLpgSQL_stmt_commit *stmt)
     }

     estate->simple_eval_estate = NULL;
+    estate->simple_eval_resowner = NULL;
     plpgsql_create_econtext(estate);

     return PLPGSQL_RC_OK;
@@ -4847,6 +4875,7 @@ exec_stmt_rollback(PLpgSQL_execstate *estate, PLpgSQL_stmt_rollback *stmt)
     }

     estate->simple_eval_estate = NULL;
+    estate->simple_eval_resowner = NULL;
     plpgsql_create_econtext(estate);

     return PLPGSQL_RC_OK;
@@ -6059,8 +6088,6 @@ loop_exit:
  * someone might redefine a SQL function that had been inlined into the simple
  * expression.  That cannot cause a simple expression to become non-simple (or
  * vice versa), but we do have to handle replacing the expression tree.
- * Fortunately it's normally inexpensive to call SPI_plan_get_cached_plan for
- * a simple expression.
  *
  * Note: if pass-by-reference, the result is in the eval_mcontext.
  * It will be freed when exec_eval_cleanup is done.
@@ -6076,7 +6103,7 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
 {
     ExprContext *econtext = estate->eval_econtext;
     LocalTransactionId curlxid = MyProc->lxid;
-    CachedPlan *cplan;
+    ParamListInfo paramLI;
     void       *save_setup_arg;
     bool        need_snapshot;
     MemoryContext oldcontext;
@@ -6090,29 +6117,92 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
     /*
      * If expression is in use in current xact, don't touch it.
      */
-    if (expr->expr_simple_in_use && expr->expr_simple_lxid == curlxid)
+    if (unlikely(expr->expr_simple_in_use) &&
+        expr->expr_simple_lxid == curlxid)
         return false;

     /*
-     * Revalidate cached plan, so that we will notice if it became stale. (We
-     * need to hold a refcount while using the plan, anyway.)  If replanning
-     * is needed, do that work in the eval_mcontext.
+     * Check to see if the cached plan has been invalidated.  If not, and this
+     * is the first use in the current transaction, save a plan refcount in
+     * the simple-expression resowner.
      */
-    oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate));
-    cplan = SPI_plan_get_cached_plan(expr->plan);
-    MemoryContextSwitchTo(oldcontext);
+    if (likely(CachedPlanIsSimplyValid(expr->expr_simple_plansource,
+                                       expr->expr_simple_plan,
+                                       (expr->expr_simple_plan_lxid != curlxid ?
+                                        estate->simple_eval_resowner : NULL))))
+    {
+        /*
+         * It's still good, so just remember that we have a refcount on the
+         * plan in the current transaction.  (If we already had one, this
+         * assignment is a no-op.)
+         */
+        expr->expr_simple_plan_lxid = curlxid;
+    }
+    else
+    {
+        /* Need to replan */
+        CachedPlan *cplan;

-    /*
-     * We can't get a failure here, because the number of CachedPlanSources in
-     * the SPI plan can't change from what exec_simple_check_plan saw; it's a
-     * property of the raw parsetree generated from the query text.
-     */
-    Assert(cplan != NULL);
+        /*
+         * If we have a valid refcount on some previous version of the plan,
+         * release it, so we don't leak plans intra-transaction.
+         */
+        if (expr->expr_simple_plan_lxid == curlxid)
+        {
+            ResourceOwner saveResourceOwner = CurrentResourceOwner;

-    /* If it got replanned, update our copy of the simple expression */
-    if (cplan->generation != expr->expr_simple_generation)
-    {
+            CurrentResourceOwner = estate->simple_eval_resowner;
+            ReleaseCachedPlan(expr->expr_simple_plan, true);
+            CurrentResourceOwner = saveResourceOwner;
+            expr->expr_simple_plan = NULL;
+            expr->expr_simple_plan_lxid = InvalidLocalTransactionId;
+        }
+
+        /* Do the replanning work in the eval_mcontext */
+        oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate));
+        cplan = SPI_plan_get_cached_plan(expr->plan);
+        MemoryContextSwitchTo(oldcontext);
+
+        /*
+         * We can't get a failure here, because the number of
+         * CachedPlanSources in the SPI plan can't change from what
+         * exec_simple_check_plan saw; it's a property of the raw parsetree
+         * generated from the query text.
+         */
+        Assert(cplan != NULL);
+
+        /*
+         * These tests probably can't fail either, but if they do, cope by
+         * declaring the plan to be non-simple.  On success, we'll acquire a
+         * refcount on the new plan, stored in simple_eval_resowner.
+         */
+        if (CachedPlanAllowsSimpleValidityCheck(expr->expr_simple_plansource,
+                                                cplan) &&
+            CachedPlanIsSimplyValid(expr->expr_simple_plansource, cplan,
+                                    estate->simple_eval_resowner))
+        {
+            /* Remember that we have the refcount */
+            expr->expr_simple_plan = cplan;
+            expr->expr_simple_plan_lxid = curlxid;
+        }
+        else
+        {
+            /* Release SPI_plan_get_cached_plan's refcount */
+            ReleaseCachedPlan(cplan, true);
+            /* Mark expression as non-simple, and fail */
+            expr->expr_simple_expr = NULL;
+            return false;
+        }
+
+        /*
+         * SPI_plan_get_cached_plan acquired a plan refcount stored in the
+         * active resowner.  We don't need that anymore, so release it.
+         */
+        ReleaseCachedPlan(cplan, true);
+
+        /* Extract desired scalar expression from cached plan */
         exec_save_simple_expr(expr, cplan);
+
         /* better recheck r/w safety, as it could change due to inlining */
         if (expr->rwparam >= 0)
             exec_check_rw_parameter(expr, expr->rwparam);
@@ -6128,16 +6218,24 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
      * Set up ParamListInfo to pass to executor.  For safety, save and restore
      * estate->paramLI->parserSetupArg around our use of the param list.
      */
-    save_setup_arg = estate->paramLI->parserSetupArg;
+    paramLI = estate->paramLI;
+    save_setup_arg = paramLI->parserSetupArg;

-    econtext->ecxt_param_list_info = setup_param_list(estate, expr);
+    /*
+     * We can skip using setup_param_list() in favor of just doing this
+     * unconditionally, because there's no need for the optimization of
+     * possibly setting ecxt_param_list_info to NULL; we've already forced use
+     * of a generic plan.
+     */
+    paramLI->parserSetupArg = (void *) expr;
+    econtext->ecxt_param_list_info = paramLI;

     /*
      * Prepare the expression for execution, if it's not been done already in
      * the current transaction.  (This will be forced to happen if we called
      * exec_save_simple_expr above.)
      */
-    if (expr->expr_simple_lxid != curlxid)
+    if (unlikely(expr->expr_simple_lxid != curlxid))
     {
         oldcontext = MemoryContextSwitchTo(estate->simple_eval_estate->es_query_cxt);
         expr->expr_simple_state =
@@ -6185,7 +6283,7 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,

     econtext->ecxt_param_list_info = NULL;

-    estate->paramLI->parserSetupArg = save_setup_arg;
+    paramLI->parserSetupArg = save_setup_arg;

     if (need_snapshot)
         PopActiveSnapshot();
@@ -6193,11 +6291,6 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
     MemoryContextSwitchTo(oldcontext);

     /*
-     * Now we can release our refcount on the cached plan.
-     */
-    ReleaseCachedPlan(cplan, true);
-
-    /*
      * That's it.
      */
     return true;
@@ -7984,10 +8077,35 @@ exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr)
     /* Can't fail, because we checked for a single CachedPlanSource above */
     Assert(cplan != NULL);

-    /* Share the remaining work with replan code path */
-    exec_save_simple_expr(expr, cplan);
+    /*
+     * Verify that plancache.c thinks the plan is simple enough to use
+     * CachedPlanIsSimplyValid.  Given the restrictions above, it's unlikely
+     * that this could fail, but if it does, just treat plan as not simple.
+     */
+    if (CachedPlanAllowsSimpleValidityCheck(plansource, cplan))
+    {
+        /*
+         * OK, use CachedPlanIsSimplyValid to save a refcount on the plan in
+         * the simple-expression resowner.  This shouldn't fail either, but if
+         * somehow it does, again we can cope by treating plan as not simple.
+         */
+        if (CachedPlanIsSimplyValid(plansource, cplan,
+                                    estate->simple_eval_resowner))
+        {
+            /* Remember that we have the refcount */
+            expr->expr_simple_plansource = plansource;
+            expr->expr_simple_plan = cplan;
+            expr->expr_simple_plan_lxid = MyProc->lxid;
+
+            /* Share the remaining work with the replan code path */
+            exec_save_simple_expr(expr, cplan);
+        }
+    }

-    /* Release our plan refcount */
+    /*
+     * Release the plan refcount obtained by SPI_plan_get_cached_plan.  (This
+     * refcount is held by the wrong resowner, so we can't just repurpose it.)
+     */
     ReleaseCachedPlan(cplan, true);
 }

@@ -8060,7 +8178,6 @@ exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan)
      * current transaction".
      */
     expr->expr_simple_expr = tle_expr;
-    expr->expr_simple_generation = cplan->generation;
     expr->expr_simple_state = NULL;
     expr->expr_simple_in_use = false;
     expr->expr_simple_lxid = InvalidLocalTransactionId;
@@ -8196,7 +8313,7 @@ exec_set_found(PLpgSQL_execstate *estate, bool state)
  *
  * We may need to create a new shared_simple_eval_estate too, if there's not
  * one already for the current transaction.  The EState will be cleaned up at
- * transaction end.
+ * transaction end.  Ditto for shared_simple_eval_resowner.
  */
 static void
 plpgsql_create_econtext(PLpgSQL_execstate *estate)
@@ -8208,8 +8325,11 @@ plpgsql_create_econtext(PLpgSQL_execstate *estate)
      * one already in the current transaction.  The EState is made a child of
      * TopTransactionContext so it will have the right lifespan.
      *
-     * Note that this path is never taken when executing a DO block; the
-     * required EState was already made by plpgsql_inline_handler.
+     * Note that this path is never taken when beginning a DO block; the
+     * required EState was already made by plpgsql_inline_handler.  However,
+     * if the DO block executes COMMIT or ROLLBACK, then we'll come here and
+     * make a shared EState to use for the rest of the DO block.  That's OK;
+     * see the comments above struct SimpleEcontextStackEntry.
      */
     if (estate->simple_eval_estate == NULL)
     {
@@ -8225,6 +8345,18 @@ plpgsql_create_econtext(PLpgSQL_execstate *estate)
     }

     /*
+     * Likewise for the simple-expression resource owner.
+     */
+    if (estate->simple_eval_resowner == NULL)
+    {
+        if (shared_simple_eval_resowner == NULL)
+            shared_simple_eval_resowner =
+                ResourceOwnerCreate(TopTransactionResourceOwner,
+                                    "PL/pgSQL simple expressions");
+        estate->simple_eval_resowner = shared_simple_eval_resowner;
+    }
+
+    /*
      * Create a child econtext for the current function.
      */
     estate->eval_econtext = CreateExprContext(estate->simple_eval_estate);
@@ -8270,29 +8402,40 @@ plpgsql_destroy_econtext(PLpgSQL_execstate *estate)
  * plpgsql_xact_cb --- post-transaction-commit-or-abort cleanup
  *
  * If a simple-expression EState was created in the current transaction,
- * it has to be cleaned up.
+ * it has to be cleaned up.  The same for the simple-expression resowner.
  */
 void
 plpgsql_xact_cb(XactEvent event, void *arg)
 {
     /*
-     * If we are doing a clean transaction shutdown, free the EState (so that
-     * any remaining resources will be released correctly). In an abort, we
-     * expect the regular abort recovery procedures to release everything of
-     * interest.
+     * If we are doing a clean transaction shutdown, free the EState and tell
+     * the resowner to release whatever plancache references it has, so that
+     * all remaining resources will be released correctly.  (We don't need to
+     * actually delete the resowner here; deletion of the
+     * TopTransactionResourceOwner will take care of that.)
+     *
+     * In an abort, we expect the regular abort recovery procedures to release
+     * everything of interest, so just clear our pointers.
      */
-    if (event == XACT_EVENT_COMMIT || event == XACT_EVENT_PREPARE)
+    if (event == XACT_EVENT_COMMIT ||
+        event == XACT_EVENT_PARALLEL_COMMIT ||
+        event == XACT_EVENT_PREPARE)
     {
         simple_econtext_stack = NULL;

         if (shared_simple_eval_estate)
             FreeExecutorState(shared_simple_eval_estate);
         shared_simple_eval_estate = NULL;
+        if (shared_simple_eval_resowner)
+            ResourceOwnerReleaseAllPlanCacheRefs(shared_simple_eval_resowner);
+        shared_simple_eval_resowner = NULL;
     }
-    else if (event == XACT_EVENT_ABORT)
+    else if (event == XACT_EVENT_ABORT ||
+             event == XACT_EVENT_PARALLEL_ABORT)
     {
         simple_econtext_stack = NULL;
         shared_simple_eval_estate = NULL;
+        shared_simple_eval_resowner = NULL;
     }
 }

diff --git a/src/pl/plpgsql/src/pl_handler.c b/src/pl/plpgsql/src/pl_handler.c
index b434818..97cf849 100644
--- a/src/pl/plpgsql/src/pl_handler.c
+++ b/src/pl/plpgsql/src/pl_handler.c
@@ -262,7 +262,9 @@ plpgsql_call_handler(PG_FUNCTION_ARGS)
             retval = (Datum) 0;
         }
         else
-            retval = plpgsql_exec_function(func, fcinfo, NULL, !nonatomic);
+            retval = plpgsql_exec_function(func, fcinfo,
+                                           NULL, NULL,
+                                           !nonatomic);
     }
     PG_FINALLY();
     {
@@ -297,6 +299,7 @@ plpgsql_inline_handler(PG_FUNCTION_ARGS)
     PLpgSQL_function *func;
     FmgrInfo    flinfo;
     EState       *simple_eval_estate;
+    ResourceOwner simple_eval_resowner;
     Datum        retval;
     int            rc;

@@ -323,22 +326,33 @@ plpgsql_inline_handler(PG_FUNCTION_ARGS)
     flinfo.fn_oid = InvalidOid;
     flinfo.fn_mcxt = CurrentMemoryContext;

-    /* Create a private EState for simple-expression execution */
+    /*
+     * Create a private EState and resowner for simple-expression execution.
+     * Notice that these are NOT tied to transaction-level resources; they
+     * must survive any COMMIT/ROLLBACK the DO block executes, since we will
+     * unconditionally try to clean them up below.  (Hence, be wary of adding
+     * anything that could fail between here and the PG_TRY block.)
+     */
     simple_eval_estate = CreateExecutorState();
+    simple_eval_resowner =
+        ResourceOwnerCreate(NULL, "PL/pgSQL DO block simple expressions");

     /* And run the function */
     PG_TRY();
     {
-        retval = plpgsql_exec_function(func, fake_fcinfo, simple_eval_estate, codeblock->atomic);
+        retval = plpgsql_exec_function(func, fake_fcinfo,
+                                       simple_eval_estate,
+                                       simple_eval_resowner,
+                                       codeblock->atomic);
     }
     PG_CATCH();
     {
         /*
          * We need to clean up what would otherwise be long-lived resources
          * accumulated by the failed DO block, principally cached plans for
-         * statements (which can be flushed with plpgsql_free_function_memory)
-         * and execution trees for simple expressions, which are in the
-         * private EState.
+         * statements (which can be flushed by plpgsql_free_function_memory),
+         * execution trees for simple expressions, which are in the private
+         * EState, and cached-plan refcounts held by the private resowner.
          *
          * Before releasing the private EState, we must clean up any
          * simple_econtext_stack entries pointing into it, which we can do by
@@ -351,8 +365,10 @@ plpgsql_inline_handler(PG_FUNCTION_ARGS)
                            GetCurrentSubTransactionId(),
                            0, NULL);

-        /* Clean up the private EState */
+        /* Clean up the private EState and resowner */
         FreeExecutorState(simple_eval_estate);
+        ResourceOwnerReleaseAllPlanCacheRefs(simple_eval_resowner);
+        ResourceOwnerDelete(simple_eval_resowner);

         /* Function should now have no remaining use-counts ... */
         func->use_count--;
@@ -366,8 +382,10 @@ plpgsql_inline_handler(PG_FUNCTION_ARGS)
     }
     PG_END_TRY();

-    /* Clean up the private EState */
+    /* Clean up the private EState and resowner */
     FreeExecutorState(simple_eval_estate);
+    ResourceOwnerReleaseAllPlanCacheRefs(simple_eval_resowner);
+    ResourceOwnerDelete(simple_eval_resowner);

     /* Function should now have no remaining use-counts ... */
     func->use_count--;
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 69df330..1af2595 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -231,12 +231,21 @@ typedef struct PLpgSQL_expr

     /* fields for "simple expression" fast-path execution: */
     Expr       *expr_simple_expr;    /* NULL means not a simple expr */
-    int            expr_simple_generation; /* plancache generation we checked */
     Oid            expr_simple_type;    /* result type Oid, if simple */
     int32        expr_simple_typmod; /* result typmod, if simple */
     bool        expr_simple_mutable;    /* true if simple expr is mutable */

     /*
+     * If the expression was ever determined to be simple, we remember its
+     * CachedPlanSource and CachedPlan here.  If expr_simple_plan_lxid matches
+     * current LXID, then we hold a refcount on expr_simple_plan in the
+     * current transaction.  Otherwise we need to get one before re-using it.
+     */
+    CachedPlanSource *expr_simple_plansource;    /* extracted from "plan" */
+    CachedPlan *expr_simple_plan;    /* extracted from "plan" */
+    LocalTransactionId expr_simple_plan_lxid;
+
+    /*
      * if expr is simple AND prepared in current transaction,
      * expr_simple_state and expr_simple_in_use are valid. Test validity by
      * seeing if expr_simple_lxid matches current LXID.  (If not,
@@ -1082,8 +1091,9 @@ typedef struct PLpgSQL_execstate
      */
     ParamListInfo paramLI;

-    /* EState to use for "simple" expression evaluation */
+    /* EState and resowner to use for "simple" expression evaluation */
     EState       *simple_eval_estate;
+    ResourceOwner simple_eval_resowner;

     /* lookup table to use for executing type casts */
     HTAB       *cast_hash;
@@ -1268,6 +1278,7 @@ extern void _PG_init(void);
 extern Datum plpgsql_exec_function(PLpgSQL_function *func,
                                    FunctionCallInfo fcinfo,
                                    EState *simple_eval_estate,
+                                   ResourceOwner simple_eval_resowner,
                                    bool atomic);
 extern HeapTuple plpgsql_exec_trigger(PLpgSQL_function *func,
                                       TriggerData *trigdata);
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 5ff7824..2ec2301 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -126,6 +126,11 @@
  * namespaceUser is the userid the path has been computed for.
  *
  * Note: all data pointed to by these List variables is in TopMemoryContext.
+ *
+ * activePathGeneration is incremented whenever the effective values of
+ * activeSearchPath/activeCreationNamespace/activeTempCreationPending change.
+ * This can be used to quickly detect whether any change has happened since
+ * a previous examination of the search path state.
  */

 /* These variables define the actually active state: */
@@ -138,6 +143,9 @@ static Oid    activeCreationNamespace = InvalidOid;
 /* if true, activeCreationNamespace is wrong, it should be temp namespace */
 static bool activeTempCreationPending = false;

+/* current generation counter; make sure this is never zero */
+static uint64 activePathGeneration = 1;
+
 /* These variables are the values last derived from namespace_search_path: */

 static List *baseSearchPath = NIL;
@@ -3373,6 +3381,7 @@ GetOverrideSearchPath(MemoryContext context)
         schemas = list_delete_first(schemas);
     }
     result->schemas = schemas;
+    result->generation = activePathGeneration;

     MemoryContextSwitchTo(oldcxt);

@@ -3393,12 +3402,18 @@ CopyOverrideSearchPath(OverrideSearchPath *path)
     result->schemas = list_copy(path->schemas);
     result->addCatalog = path->addCatalog;
     result->addTemp = path->addTemp;
+    result->generation = path->generation;

     return result;
 }

 /*
  * OverrideSearchPathMatchesCurrent - does path match current setting?
+ *
+ * This is tested over and over in some common code paths, and in the typical
+ * scenario where the active search path seldom changes, it'll always succeed.
+ * We make that case fast by keeping a generation counter that is advanced
+ * whenever the active search path changes.
  */
 bool
 OverrideSearchPathMatchesCurrent(OverrideSearchPath *path)
@@ -3408,6 +3423,10 @@ OverrideSearchPathMatchesCurrent(OverrideSearchPath *path)

     recomputeNamespacePath();

+    /* Quick out if already known equal to active path. */
+    if (path->generation == activePathGeneration)
+        return true;
+
     /* We scan down the activeSearchPath to see if it matches the input. */
     lc = list_head(activeSearchPath);

@@ -3440,6 +3459,13 @@ OverrideSearchPathMatchesCurrent(OverrideSearchPath *path)
     }
     if (lc)
         return false;
+
+    /*
+     * Update path->generation so that future tests will return quickly, so
+     * long as the active search path doesn't change.
+     */
+    path->generation = activePathGeneration;
+
     return true;
 }

@@ -3510,6 +3536,14 @@ PushOverrideSearchPath(OverrideSearchPath *newpath)
     activeCreationNamespace = entry->creationNamespace;
     activeTempCreationPending = false;    /* XXX is this OK? */

+    /*
+     * We always increment activePathGeneration when pushing/popping an
+     * override path.  In current usage, these actions always change the
+     * effective path state, so there's no value in checking to see if it
+     * didn't change.
+     */
+    activePathGeneration++;
+
     MemoryContextSwitchTo(oldcxt);
 }

@@ -3551,6 +3585,9 @@ PopOverrideSearchPath(void)
         activeCreationNamespace = baseCreationNamespace;
         activeTempCreationPending = baseTempCreationPending;
     }
+
+    /* As above, the generation always increments. */
+    activePathGeneration++;
 }


@@ -3707,6 +3744,7 @@ recomputeNamespacePath(void)
     ListCell   *l;
     bool        temp_missing;
     Oid            firstNS;
+    bool        pathChanged;
     MemoryContext oldcxt;

     /* Do nothing if an override search spec is active. */
@@ -3814,18 +3852,31 @@ recomputeNamespacePath(void)
         oidlist = lcons_oid(myTempNamespace, oidlist);

     /*
-     * Now that we've successfully built the new list of namespace OIDs, save
-     * it in permanent storage.
+     * We want to detect the case where the effective value of the base search
+     * path variables didn't change.  As long as we're doing so, we can avoid
+     * copying the OID list unncessarily.
      */
-    oldcxt = MemoryContextSwitchTo(TopMemoryContext);
-    newpath = list_copy(oidlist);
-    MemoryContextSwitchTo(oldcxt);
+    if (baseCreationNamespace == firstNS &&
+        baseTempCreationPending == temp_missing &&
+        equal(oidlist, baseSearchPath))
+    {
+        pathChanged = false;
+    }
+    else
+    {
+        pathChanged = true;
+
+        /* Must save OID list in permanent storage. */
+        oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+        newpath = list_copy(oidlist);
+        MemoryContextSwitchTo(oldcxt);

-    /* Now safe to assign to state variables. */
-    list_free(baseSearchPath);
-    baseSearchPath = newpath;
-    baseCreationNamespace = firstNS;
-    baseTempCreationPending = temp_missing;
+        /* Now safe to assign to state variables. */
+        list_free(baseSearchPath);
+        baseSearchPath = newpath;
+        baseCreationNamespace = firstNS;
+        baseTempCreationPending = temp_missing;
+    }

     /* Mark the path valid. */
     baseSearchPathValid = true;
@@ -3836,6 +3887,16 @@ recomputeNamespacePath(void)
     activeCreationNamespace = baseCreationNamespace;
     activeTempCreationPending = baseTempCreationPending;

+    /*
+     * Bump the generation only if something actually changed.  (Notice that
+     * what we compared to was the old state of the base path variables; so
+     * this does not deal with the situation where we have just popped an
+     * override path and restored the prior state of the base path.  Instead
+     * we rely on the override-popping logic to have bumped the generation.)
+     */
+    if (pathChanged)
+        activePathGeneration++;
+
     /* Clean up. */
     pfree(rawname);
     list_free(namelist);
@@ -4054,6 +4115,8 @@ AtEOXact_Namespace(bool isCommit, bool parallel)
         activeSearchPath = baseSearchPath;
         activeCreationNamespace = baseCreationNamespace;
         activeTempCreationPending = baseTempCreationPending;
+        /* Always bump generation --- see note in recomputeNamespacePath */
+        activePathGeneration++;
     }
 }

@@ -4109,6 +4172,8 @@ AtEOSubXact_Namespace(bool isCommit, SubTransactionId mySubid,
         overrideStack = list_delete_first(overrideStack);
         list_free(entry->searchPath);
         pfree(entry);
+        /* Always bump generation --- see note in recomputeNamespacePath */
+        activePathGeneration++;
     }

     /* Activate the next level down. */
@@ -4118,6 +4183,12 @@ AtEOSubXact_Namespace(bool isCommit, SubTransactionId mySubid,
         activeSearchPath = entry->searchPath;
         activeCreationNamespace = entry->creationNamespace;
         activeTempCreationPending = false;    /* XXX is this OK? */
+
+        /*
+         * It's probably unnecessary to bump generation here, but this should
+         * not be a performance-critical case, so better to be over-cautious.
+         */
+        activePathGeneration++;
     }
     else
     {
@@ -4125,6 +4196,12 @@ AtEOSubXact_Namespace(bool isCommit, SubTransactionId mySubid,
         activeSearchPath = baseSearchPath;
         activeCreationNamespace = baseCreationNamespace;
         activeTempCreationPending = baseTempCreationPending;
+
+        /*
+         * If we popped an override stack entry, then we already bumped the
+         * generation above.  If we did not, then the above assignments did
+         * nothing and we need not bump the generation.
+         */
     }
 }

@@ -4264,6 +4341,7 @@ InitializeSearchPath(void)
         activeSearchPath = baseSearchPath;
         activeCreationNamespace = baseCreationNamespace;
         activeTempCreationPending = baseTempCreationPending;
+        activePathGeneration++; /* pro forma */
     }
     else
     {
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index 3e3a192..2456c08 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -49,12 +49,17 @@ typedef enum TempNamespaceStatus

 /*
  *    Structure for xxxOverrideSearchPath functions
+ *
+ * The generation counter is private to namespace.c and shouldn't be touched
+ * by other code.  It can be initialized to zero if necessary (that means
+ * "not known equal to the current active path").
  */
 typedef struct OverrideSearchPath
 {
     List       *schemas;        /* OIDs of explicitly named schemas */
     bool        addCatalog;        /* implicitly prepend pg_catalog? */
     bool        addTemp;        /* implicitly prepend temp schema? */
+    uint64        generation;        /* for quick detection of equality to active */
 } OverrideSearchPath;

 /*

Re: plan cache overhead on plpgsql expression

From

Andres Freund

Date:

26 March 2020, 18:49:47

Hi,

On 2020-03-26 14:37:59 -0400, Tom Lane wrote:
> I wrote:
> > I had a thought about a possibly-cleaner way to do this.  We could invent
> > a resowner function, say ResourceOwnerReleaseAllPlanCacheRefs, that
> > explicitly releases all plancache pins it knows about.  So plpgsql
> > would not call the regular ResourceOwnerRelease entry point at all,
> > but just call that and then ResourceOwnerDelete, again relying on the
> > assertions therein to catch anything not released.
> 
> Here's a version that does it like that.  This does seem marginally
> nicer than the other way.  I have a feeling that at some point we'll
> want to expose resowners' contents more generally, but I'm not quite
> sure what the use-cases will be, so I don't want to design that now.

Yea, agreed with all of what you said in that paragraph.


> Testing that reminded me of the other regression test failure I'd seen
> when I first tried to do it: select_parallel.sql shows a WARNING about
> a plancache leak in a parallel worker process.  When I looked into the
> reason for that, it turned out that some cowboy has split
> XACT_EVENT_COMMIT into XACT_EVENT_COMMIT and
> XACT_EVENT_PARALLEL_COMMIT (where the latter is used in parallel
> workers) without bothering to fix the collateral damage to plpgsql.
> So plpgsql_xact_cb isn't doing any cleanup in parallel workers, and
> hasn't been for a couple of releases.

Ugh.


> The bad effects of that are probably limited given that the worker
> process will exit after committing, but I still think that that part
> of this patch is a bug fix that needs to be back-patched.

Ugh. Lucky that we don't register many things inside those resowners.


> (Just
> looking at what FreeExecutorState does, I wonder whether
> jit_release_context has any side-effects that are visible outside the
> process?  But I bet I can make a test case that shows issues even
> without JIT, based on the failure to call ExprContext shutdown
> callbacks.)

JIT doesn't currently have side-effects outside of the process. I really
want to add caching support, which'd presumably have problems due to
this, but it's not there yet... This could lead to leaking a fair bit of
memory over time otherwise.



>  /*
> + * CachedPlanAllowsSimpleValidityCheck: can we use CachedPlanIsSimplyValid?
> + *
> + * This function, together with CachedPlanIsSimplyValid, provides a fast path
> + * for revalidating "simple" generic plans.  The core requirement to be simple
> + * is that the plan must not require taking any locks, which translates to
> + * not touching any tables; this happens to match up well with an important
> + * use-case in PL/pgSQL.

Hm - is there currently sufficient guarantee that we absorb sinval
messages? Would still matter for types, functions, etc?


>  /*
> + * ResourceOwnerReleaseAllPlanCacheRefs
> + *        Release the plancache references (only) held by this owner.
> + *
> + * We might eventually add similar functions for other resource types,
> + * but for now, only this is needed.
> + */
> +void
> +ResourceOwnerReleaseAllPlanCacheRefs(ResourceOwner owner)
> +{
> +    ResourceOwner save;
> +    Datum        foundres;
> +
> +    save = CurrentResourceOwner;
> +    CurrentResourceOwner = owner;
> +    while (ResourceArrayGetAny(&(owner->planrefarr), &foundres))
> +    {
> +        CachedPlan *res = (CachedPlan *) DatumGetPointer(foundres);
> +
> +        ReleaseCachedPlan(res, true);
> +    }
> +    CurrentResourceOwner = save;
> +}

While it'd do a small bit unnecessary work, I do wonder if it'd be
better to use this code in ResourceOwnereReleaseInternal().


> --- a/src/pl/plpgsql/src/pl_exec.c
> +++ b/src/pl/plpgsql/src/pl_exec.c
> @@ -84,6 +84,13 @@ typedef struct
>   * has its own simple-expression EState, which is cleaned up at exit from
>   * plpgsql_inline_handler().  DO blocks still use the simple_econtext_stack,
>   * though, so that subxact abort cleanup does the right thing.
> + *
> + * (However, if a DO block executes COMMIT or ROLLBACK, then exec_stmt_commit
> + * or exec_stmt_rollback will unlink it from the DO's simple-expression EState
> + * and create a new shared EState that will be used thenceforth.  The original
> + * EState will be cleaned up when we get back to plpgsql_inline_handler.  This
> + * is a bit ugly, but it isn't worth doing better, since scenarios like this
> + * can't result in indefinite accumulation of state trees.)
>   */
>  typedef struct SimpleEcontextStackEntry
>  {
> @@ -96,6 +103,15 @@ static EState *shared_simple_eval_estate = NULL;
>  static SimpleEcontextStackEntry *simple_econtext_stack = NULL;
>  
>  /*
> + * In addition to the shared simple-eval EState, we have a shared resource
> + * owner that holds refcounts on the CachedPlans for any "simple" expressions
> + * we have evaluated in the current transaction.  This allows us to avoid
> + * continually grabbing and releasing a plan refcount when a simple expression
> + * is used over and over.
> + */
> +static ResourceOwner shared_simple_eval_resowner = NULL;

Perhaps add a reference to the new (appreciated, btw) DO comment above?


Greetings,

Andres Freund

Re: plan cache overhead on plpgsql expression

From

Tom Lane

Date:

26 March 2020, 19:05:20

Andres Freund <andres@anarazel.de> writes:
> On 2020-03-26 14:37:59 -0400, Tom Lane wrote:
>> + * This function, together with CachedPlanIsSimplyValid, provides a fast path
>> + * for revalidating "simple" generic plans.  The core requirement to be simple
>> + * is that the plan must not require taking any locks, which translates to
>> + * not touching any tables; this happens to match up well with an important
>> + * use-case in PL/pgSQL.

> Hm - is there currently sufficient guarantee that we absorb sinval
> messages? Would still matter for types, functions, etc?

There are potentially issues of that sort throughout the backend, not
just here, since we don't have any locking on types or functions.
I don't think it's this patch's job to address that.  In practice
I think we've thought about it and concluded that the cost/benefit
of introducing such locks just isn't promising:

* Generally speaking you can't do anything very interesting to a type
anyway, at least not with supported DDL.  The worst-case situation that
could materialize AFAIK is possibly evaluating slightly-stale constraints
for a domain.  (The typcache does have sinval invalidation for those
constraints, but I don't recall offhand how much we guarantee about
how quickly we'll notice updates.)

* For functions, you might execute a somewhat stale version of the
function body.  The bad effects there are pretty limited since a function
is defined by just one catalog row, unlike tables, so you can't see a
self-inconsistent version of it.

The amount of lock overhead that it would take to remove those edge
cases seems slightly staggering, so I doubt we'd ever do it.

> While it'd do a small bit unnecessary work, I do wonder if it'd be
> better to use this code in ResourceOwnereReleaseInternal().

When and if we refactor to expose this sort of thing more generally,
it might be worth doing it like that.  I don't think it helps much
right now.

> Perhaps add a reference to the new (appreciated, btw) DO comment above?

Can do.

Again, thanks for reviewing!

            regards, tom lane

Re: plan cache overhead on plpgsql expression

From

Tom Lane

Date:

26 March 2020, 20:59:49

Andres Freund <andres@anarazel.de> writes:
> On 2020-03-26 14:37:59 -0400, Tom Lane wrote:
>> Testing that reminded me of the other regression test failure I'd seen
>> when I first tried to do it: select_parallel.sql shows a WARNING about
>> a plancache leak in a parallel worker process.  When I looked into the
>> reason for that, it turned out that some cowboy has split
>> XACT_EVENT_COMMIT into XACT_EVENT_COMMIT and
>> XACT_EVENT_PARALLEL_COMMIT (where the latter is used in parallel
>> workers) without bothering to fix the collateral damage to plpgsql.
>> So plpgsql_xact_cb isn't doing any cleanup in parallel workers, and
>> hasn't been for a couple of releases.
>> The bad effects of that are probably limited given that the worker
>> process will exit after committing, but I still think that that part
>> of this patch is a bug fix that needs to be back-patched.

> Ugh. Lucky that we don't register many things inside those resowners.

Yeah.  I spent some time trying to produce a failure this way, and
concluded that it's pretty hard because most of the relevant callbacks
will be run during ExprContext shutdown, which is done during plpgsql
function exit.  In a non-transaction-abort situation, the simple EState
shouldn't have any live ExprContexts left at commit.  I did find a case
where a memory context callback attached to the EState's query context
doesn't get run when expected ... but it still gets run later, when the
whole memory context tree is destroyed.  So I can't demonstrate any
user-visible misbehavior in the core code.  But it still seems like a
prudent idea to back-patch a fix, in case I missed something or there is
some extension that's pushing the boundaries further.  It's definitely
not very cool that we're leaving behind a dangling static pointer to an
EState that won't exist once TopTransactionMemoryContext is gone.

I'll back-patch relevant parts of those comments about DO block
management, too.

            regards, tom lane

Re: plan cache overhead on plpgsql expression

From

Tom Lane

Date:

27 March 2020, 18:01:44

I wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
>> One thing -- I don't get the division between
>> CachedPlanAllowsSimpleValidityCheck() and CachedPlanIsSimplyValid().
>> Maybe I am missing something, but could there not be just one
>> function, possibly using whether expr_simple_expr is set or not to
>> skip or do, resp., the checks that the former does?

> Well, we don't want to do the initial checks over again every time;
> we want the is-valid test to be as simple and fast as we can make it.
> I suppose we could have one function with a boolean flag saying "this is a
> recheck", but I don't find that idea to be any better than the way it is.

So after looking at the buildfarm results, I think you were on to
something.  The initial and recheck conditions actually have to be
a bit different, and the reason is that immediately after GetCachedPlan
has produced a plan, it's possible for plansource->is_valid to be false
even though the derived plan is marked valid.  (In the buildfarm, this
is happening because of CLOBBER_CACHE_ALWAYS or equivalent cache flushes;
in the real world it'd probably require sinval queue overflow to happen
while building the plan.)

What we want in this situation is to go ahead and use the derived plan,
and then rebuild next time; that's what the pre-existing code did and
it's really the only reasonable answer.  It might seem better to go
back and try to rebuild the plan right away, but that'd be an infinite
loop in a CLOBBER_CACHE_ALWAYS build.  Also, if we fail to use the
derived plan at all, that amounts to disabling the "simple expression"
optimization as a result of a chance sinval overflow.  That's bad from
a performance standpoint and it will also cause regression test output
changes (since, as you previously discovered, the simple-expression
path produces different CONTEXT messages for error cases --- maybe we
should change that, but I don't want to be forced into it).

The existing code structure can't support doing it like that, so we have
to refactor to make the initial check and the recheck be separate code.
Working on a patch for that now.

            regards, tom lane