Thread: cache invalidation for PL/pgsql functions

cache invalidation for PL/pgsql functions

From

Robert Haas

Date:

28 April 2015, 17:43:54

The following behavior surprised me, and a few other people at
EnterpriseDB, and one of our customers:

rhaas=# create table foo (a int);
CREATE TABLE
rhaas=# create or replace function test (x foo) returns int as $$begin
return x.b; end$$ language plpgsql;
CREATE FUNCTION
rhaas=# alter table foo add column b int;
ALTER TABLE
rhaas=# select test(null::foo);
ERROR:  record "x" has no field "b"
LINE 1: SELECT x.b              ^
QUERY:  SELECT x.b
CONTEXT:  PL/pgSQL function test(foo) line 1 at RETURN
rhaas=# \c
You are now connected to database "rhaas" as user "rhaas".
rhaas=# select test(null::foo);test
------

(1 row)

I hate to use the term "bug" for what somebody's probably going to
tell me is acceptable behavior, but that seems like a bug.  I guess
the root of the problem is that PL/plgsql's cache invalidation logic
only considers the pg_proc row's TID and xmin when deciding whether to
recompile.  For base types that's probably OK, but for composite
types, not so much.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cache invalidation for PL/pgsql functions

From

Pavel Stehule

Date:

28 April 2015, 17:59:01

2015-04-28 19:43 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:

The following behavior surprised me, and a few other people at
EnterpriseDB, and one of our customers:

rhaas=# create table foo (a int);
CREATE TABLE
rhaas=# create or replace function test (x foo) returns int as $$begin
return x.b; end$$ language plpgsql;
CREATE FUNCTION
rhaas=# alter table foo add column b int;
ALTER TABLE
rhaas=# select test(null::foo);
ERROR: record "x" has no field "b"
LINE 1: SELECT x.b
^
QUERY: SELECT x.b
CONTEXT: PL/pgSQL function test(foo) line 1 at RETURN
rhaas=# \c
You are now connected to database "rhaas" as user "rhaas".
rhaas=# select test(null::foo);
test
------

(1 row)

I hate to use the term "bug" for what somebody's probably going to
tell me is acceptable behavior, but that seems like a bug. I guess
the root of the problem is that PL/plgsql's cache invalidation logic
only considers the pg_proc row's TID and xmin when deciding whether to
recompile. For base types that's probably OK, but for composite
types, not so much.

Thoughts?

It is inconsistent - and I know so one bigger Czech companies, that use Postgres, had small outage, because they found this issue, when deployed new version of procedure.

The question is what is a cost of fixing

Regards

Pavel

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: cache invalidation for PL/pgsql functions

From

Jim Nasby

Date:

28 April 2015, 18:27:07

On 4/28/15 12:58 PM, Pavel Stehule wrote:
>     I hate to use the term "bug" for what somebody's probably going to
>     tell me is acceptable behavior, but that seems like a bug.  I guess
>     the root of the problem is that PL/plgsql's cache invalidation logic
>     only considers the pg_proc row's TID and xmin when deciding whether to
>     recompile.  For base types that's probably OK, but for composite
>     types, not so much.
>
>     Thoughts?
>
>
> It is inconsistent  - and I know so one bigger Czech companies, that use
> Postgres, had small outage, because they found this issue, when deployed
> new version of procedure.
>
> The question is what is a cost of fixing

We don't actually consider the argument types at all (pl_comp.c:169):
    /* We have a compiled function, but is it still valid? */    if (function->fn_xmin ==
HeapTupleHeaderGetRawXmin(procTup->t_data)&&        ItemPointerEquals(&function->fn_tid, &procTup->t_self))
 

Perhaps pg_depend protects from a base type changing.

This problem also exists for internal variables:

create table moo(m int);
create function t() returns text language plpgsql as $$declare m moo; 
begin m := null; return m.t; end$$;
select t(); -- Expected error
alter table moo add t text;
select t(); -- Unexpected error

So it seems the correct fix would be to remember the list of every xmin 
for every type we saw... unless there's some way to tie the proc into 
type sinval messages?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: cache invalidation for PL/pgsql functions

From

Merlin Moncure

Date:

28 April 2015, 19:41:40

On Tue, Apr 28, 2015 at 12:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I hate to use the term "bug" for what somebody's probably going to
> tell me is acceptable behavior, but that seems like a bug.  I guess
> the root of the problem is that PL/plgsql's cache invalidation logic
> only considers the pg_proc row's TID and xmin when deciding whether to
> recompile.  For base types that's probably OK, but for composite
> types, not so much.

It was a missed case in the invalidation logic.   plpgsql was
deliberately modified to invalidate plans upon schema changes -- this
is a way to outsmart that system.   definitely a bug IMNSHO.

merlin

Re: cache invalidation for PL/pgsql functions

From

Tom Lane

Date:

28 April 2015, 19:59:16

Robert Haas <robertmhaas@gmail.com> writes:
> rhaas=# create table foo (a int);
> CREATE TABLE
> rhaas=# create or replace function test (x foo) returns int as $$begin
> return x.b; end$$ language plpgsql;
> CREATE FUNCTION
> rhaas=# alter table foo add column b int;
> ALTER TABLE
> rhaas=# select test(null::foo);
> ERROR:  record "x" has no field "b"
> LINE 1: SELECT x.b
>                ^
> QUERY:  SELECT x.b
> CONTEXT:  PL/pgSQL function test(foo) line 1 at RETURN

I believe that this was one of the cases I had in mind when I previously
proposed that we stop using PLPGSQL_DTYPE_ROW entirely for composite-type
variables, and make them use PLPGSQL_DTYPE_REC (that is, the same code
paths used for RECORD).

As I recall, that proposal was shot down with no investigation whatsoever,
on the grounds that it might possibly make some cases slower.
        regards, tom lane

Re: cache invalidation for PL/pgsql functions

From

Robert Haas

Date:

29 April 2015, 18:26:08

On Tue, Apr 28, 2015 at 3:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> rhaas=# create table foo (a int);
>> CREATE TABLE
>> rhaas=# create or replace function test (x foo) returns int as $$begin
>> return x.b; end$$ language plpgsql;
>> CREATE FUNCTION
>> rhaas=# alter table foo add column b int;
>> ALTER TABLE
>> rhaas=# select test(null::foo);
>> ERROR:  record "x" has no field "b"
>> LINE 1: SELECT x.b
>>                ^
>> QUERY:  SELECT x.b
>> CONTEXT:  PL/pgSQL function test(foo) line 1 at RETURN
>
> I believe that this was one of the cases I had in mind when I previously
> proposed that we stop using PLPGSQL_DTYPE_ROW entirely for composite-type
> variables, and make them use PLPGSQL_DTYPE_REC (that is, the same code
> paths used for RECORD).
>
> As I recall, that proposal was shot down with no investigation whatsoever,
> on the grounds that it might possibly make some cases slower.

I don't know whether that would help or not.  I was thinking about
whether PLs should be using CacheRegisterSyscacheCallback() to notice
when types that they care about change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cache invalidation for PL/pgsql functions

From

Marko Tiikkaja

Date:

01 May 2015, 13:09:46

On 2015-04-28 19:43, Robert Haas wrote:
> I guess
> the root of the problem is that PL/plgsql's cache invalidation logic
> only considers the pg_proc row's TID and xmin when deciding whether to
> recompile.  For base types that's probably OK, but for composite
> types, not so much.
>
> Thoughts?

We recently hit a similar case in our production environment.  What was 
annoying about it is that there didn't seem to be a way for the 
application to fix the issue by itself, short of reconnecting; even 
DISCARD ALL doesn't help.  If we can't fix the underlying issue, can we 
at least provide a way for apps to invalidate these caches themselves, 
for example in the form of a DISCARD option?

.m

Re: cache invalidation for PL/pgsql functions

From

Robert Haas

Date:

01 May 2015, 13:40:58

On Fri, May 1, 2015 at 9:09 AM, Marko Tiikkaja <marko@joh.to> wrote:
> On 2015-04-28 19:43, Robert Haas wrote:
>> I guess
>> the root of the problem is that PL/plgsql's cache invalidation logic
>> only considers the pg_proc row's TID and xmin when deciding whether to
>> recompile.  For base types that's probably OK, but for composite
>> types, not so much.
>>
>> Thoughts?
>
> We recently hit a similar case in our production environment.  What was
> annoying about it is that there didn't seem to be a way for the application
> to fix the issue by itself, short of reconnecting; even DISCARD ALL doesn't
> help.  If we can't fix the underlying issue, can we at least provide a way
> for apps to invalidate these caches themselves, for example in the form of a
> DISCARD option?

It's been discussed before and I am in favor of it.  However the
implementation is a bit challenging.  The DISCARD command doesn't know
what PLs may have decided to cache, nonwithstanding the fact that they
all cache basically the same stuff using basically the same method.  I
think the PL interface will need to be extended in some way to support
a new callback.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cache invalidation for PL/pgsql functions

From

Fabrízio de Royes Mello

Date:

01 May 2015, 15:19:05

Em sexta-feira, 1 de maio de 2015, Robert Haas <robertmhaas@gmail.com> escreveu:

On Fri, May 1, 2015 at 9:09 AM, Marko Tiikkaja <marko@joh.to> wrote:
> On 2015-04-28 19:43, Robert Haas wrote:
>> I guess
>> the root of the problem is that PL/plgsql's cache invalidation logic
>> only considers the pg_proc row's TID and xmin when deciding whether to
>> recompile. For base types that's probably OK, but for composite
>> types, not so much.
>>
>> Thoughts?
>
> We recently hit a similar case in our production environment. What was
> annoying about it is that there didn't seem to be a way for the application
> to fix the issue by itself, short of reconnecting; even DISCARD ALL doesn't
> help. If we can't fix the underlying issue, can we at least provide a way
> for apps to invalidate these caches themselves, for example in the form of a
> DISCARD option?

It's been discussed before and I am in favor of it. However the
implementation is a bit challenging. The DISCARD command doesn't know
what PLs may have decided to cache, nonwithstanding the fact that they
all cache basically the same stuff using basically the same method. I
think the PL interface will need to be extended in some way to support
a new callback.

IMHO we need a way to DISCARD run a cleanup code for each installed extension. Maybe with a new option like DISCARD EXTENSIONS. So each extension could have and register your own cleanup code.

Regards,

Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL

>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

>> Github: http://github.com/fabriziomello

Re: cache invalidation for PL/pgsql functions

From

Tom Lane

Date:

01 May 2015, 17:29:55

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, May 1, 2015 at 9:09 AM, Marko Tiikkaja <marko@joh.to> wrote:
>> We recently hit a similar case in our production environment.  What was
>> annoying about it is that there didn't seem to be a way for the application
>> to fix the issue by itself, short of reconnecting; even DISCARD ALL doesn't
>> help.  If we can't fix the underlying issue, can we at least provide a way
>> for apps to invalidate these caches themselves, for example in the form of a
>> DISCARD option?

> It's been discussed before and I am in favor of it.

I'm not.  We should fix the problem not expect applications to band-aid
around it.  This would be particularly ill-advised because there are so
many applications that just blindly do DISCARD ALL when changing contexts.

Having said that, I'm not sure that it's easy to get to a solution :-(
        regards, tom lane

Re: cache invalidation for PL/pgsql functions

From

Merlin Moncure

Date:

04 May 2015, 14:58:37

On Fri, May 1, 2015 at 12:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, May 1, 2015 at 9:09 AM, Marko Tiikkaja <marko@joh.to> wrote:
>>> We recently hit a similar case in our production environment.  What was
>>> annoying about it is that there didn't seem to be a way for the application
>>> to fix the issue by itself, short of reconnecting; even DISCARD ALL doesn't
>>> help.  If we can't fix the underlying issue, can we at least provide a way
>>> for apps to invalidate these caches themselves, for example in the form of a
>>> DISCARD option?
>
>> It's been discussed before and I am in favor of it.
>
> I'm not.  We should fix the problem not expect applications to band-aid
> around it.  This would be particularly ill-advised because there are so
> many applications that just blindly do DISCARD ALL when changing contexts.

The most common real world manifestation of this problem (having to
DISCARD to invalidate plans) is when using schema partitioned data
with pl/pgsql functions.  Attempting to hide everything under DISCARD
is trading one problem for another:  it's going to be hard for users
of tools like pgbouncer (the main users of DISCARD) to correctly judge
the performance trade-offs and this statement's workings is already
pretty arcane.

merlin