Thread: cache invalidation for PL/pgsql functions
The following behavior surprised me, and a few other people at EnterpriseDB, and one of our customers: rhaas=# create table foo (a int); CREATE TABLE rhaas=# create or replace function test (x foo) returns int as $$begin return x.b; end$$ language plpgsql; CREATE FUNCTION rhaas=# alter table foo add column b int; ALTER TABLE rhaas=# select test(null::foo); ERROR: record "x" has no field "b" LINE 1: SELECT x.b ^ QUERY: SELECT x.b CONTEXT: PL/pgSQL function test(foo) line 1 at RETURN rhaas=# \c You are now connected to database "rhaas" as user "rhaas". rhaas=# select test(null::foo);test ------ (1 row) I hate to use the term "bug" for what somebody's probably going to tell me is acceptable behavior, but that seems like a bug. I guess the root of the problem is that PL/plgsql's cache invalidation logic only considers the pg_proc row's TID and xmin when deciding whether to recompile. For base types that's probably OK, but for composite types, not so much. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2015-04-28 19:43 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:
The following behavior surprised me, and a few other people at
EnterpriseDB, and one of our customers:
rhaas=# create table foo (a int);
CREATE TABLE
rhaas=# create or replace function test (x foo) returns int as $$begin
return x.b; end$$ language plpgsql;
CREATE FUNCTION
rhaas=# alter table foo add column b int;
ALTER TABLE
rhaas=# select test(null::foo);
ERROR: record "x" has no field "b"
LINE 1: SELECT x.b
^
QUERY: SELECT x.b
CONTEXT: PL/pgSQL function test(foo) line 1 at RETURN
rhaas=# \c
You are now connected to database "rhaas" as user "rhaas".
rhaas=# select test(null::foo);
test
------
(1 row)
I hate to use the term "bug" for what somebody's probably going to
tell me is acceptable behavior, but that seems like a bug. I guess
the root of the problem is that PL/plgsql's cache invalidation logic
only considers the pg_proc row's TID and xmin when deciding whether to
recompile. For base types that's probably OK, but for composite
types, not so much.
Thoughts?
It is inconsistent - and I know so one bigger Czech companies, that use Postgres, had small outage, because they found this issue, when deployed new version of procedure.
The question is what is a cost of fixing
Regards
Pavel
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/28/15 12:58 PM, Pavel Stehule wrote: > I hate to use the term "bug" for what somebody's probably going to > tell me is acceptable behavior, but that seems like a bug. I guess > the root of the problem is that PL/plgsql's cache invalidation logic > only considers the pg_proc row's TID and xmin when deciding whether to > recompile. For base types that's probably OK, but for composite > types, not so much. > > Thoughts? > > > It is inconsistent - and I know so one bigger Czech companies, that use > Postgres, had small outage, because they found this issue, when deployed > new version of procedure. > > The question is what is a cost of fixing We don't actually consider the argument types at all (pl_comp.c:169): /* We have a compiled function, but is it still valid? */ if (function->fn_xmin == HeapTupleHeaderGetRawXmin(procTup->t_data)&& ItemPointerEquals(&function->fn_tid, &procTup->t_self)) Perhaps pg_depend protects from a base type changing. This problem also exists for internal variables: create table moo(m int); create function t() returns text language plpgsql as $$declare m moo; begin m := null; return m.t; end$$; select t(); -- Expected error alter table moo add t text; select t(); -- Unexpected error So it seems the correct fix would be to remember the list of every xmin for every type we saw... unless there's some way to tie the proc into type sinval messages? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On Tue, Apr 28, 2015 at 12:43 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I hate to use the term "bug" for what somebody's probably going to > tell me is acceptable behavior, but that seems like a bug. I guess > the root of the problem is that PL/plgsql's cache invalidation logic > only considers the pg_proc row's TID and xmin when deciding whether to > recompile. For base types that's probably OK, but for composite > types, not so much. It was a missed case in the invalidation logic. plpgsql was deliberately modified to invalidate plans upon schema changes -- this is a way to outsmart that system. definitely a bug IMNSHO. merlin
Robert Haas <robertmhaas@gmail.com> writes: > rhaas=# create table foo (a int); > CREATE TABLE > rhaas=# create or replace function test (x foo) returns int as $$begin > return x.b; end$$ language plpgsql; > CREATE FUNCTION > rhaas=# alter table foo add column b int; > ALTER TABLE > rhaas=# select test(null::foo); > ERROR: record "x" has no field "b" > LINE 1: SELECT x.b > ^ > QUERY: SELECT x.b > CONTEXT: PL/pgSQL function test(foo) line 1 at RETURN I believe that this was one of the cases I had in mind when I previously proposed that we stop using PLPGSQL_DTYPE_ROW entirely for composite-type variables, and make them use PLPGSQL_DTYPE_REC (that is, the same code paths used for RECORD). As I recall, that proposal was shot down with no investigation whatsoever, on the grounds that it might possibly make some cases slower. regards, tom lane
On Tue, Apr 28, 2015 at 3:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> rhaas=# create table foo (a int); >> CREATE TABLE >> rhaas=# create or replace function test (x foo) returns int as $$begin >> return x.b; end$$ language plpgsql; >> CREATE FUNCTION >> rhaas=# alter table foo add column b int; >> ALTER TABLE >> rhaas=# select test(null::foo); >> ERROR: record "x" has no field "b" >> LINE 1: SELECT x.b >> ^ >> QUERY: SELECT x.b >> CONTEXT: PL/pgSQL function test(foo) line 1 at RETURN > > I believe that this was one of the cases I had in mind when I previously > proposed that we stop using PLPGSQL_DTYPE_ROW entirely for composite-type > variables, and make them use PLPGSQL_DTYPE_REC (that is, the same code > paths used for RECORD). > > As I recall, that proposal was shot down with no investigation whatsoever, > on the grounds that it might possibly make some cases slower. I don't know whether that would help or not. I was thinking about whether PLs should be using CacheRegisterSyscacheCallback() to notice when types that they care about change. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-04-28 19:43, Robert Haas wrote: > I guess > the root of the problem is that PL/plgsql's cache invalidation logic > only considers the pg_proc row's TID and xmin when deciding whether to > recompile. For base types that's probably OK, but for composite > types, not so much. > > Thoughts? We recently hit a similar case in our production environment. What was annoying about it is that there didn't seem to be a way for the application to fix the issue by itself, short of reconnecting; even DISCARD ALL doesn't help. If we can't fix the underlying issue, can we at least provide a way for apps to invalidate these caches themselves, for example in the form of a DISCARD option? .m
On Fri, May 1, 2015 at 9:09 AM, Marko Tiikkaja <marko@joh.to> wrote: > On 2015-04-28 19:43, Robert Haas wrote: >> I guess >> the root of the problem is that PL/plgsql's cache invalidation logic >> only considers the pg_proc row's TID and xmin when deciding whether to >> recompile. For base types that's probably OK, but for composite >> types, not so much. >> >> Thoughts? > > We recently hit a similar case in our production environment. What was > annoying about it is that there didn't seem to be a way for the application > to fix the issue by itself, short of reconnecting; even DISCARD ALL doesn't > help. If we can't fix the underlying issue, can we at least provide a way > for apps to invalidate these caches themselves, for example in the form of a > DISCARD option? It's been discussed before and I am in favor of it. However the implementation is a bit challenging. The DISCARD command doesn't know what PLs may have decided to cache, nonwithstanding the fact that they all cache basically the same stuff using basically the same method. I think the PL interface will need to be extended in some way to support a new callback. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Em sexta-feira, 1 de maio de 2015, Robert Haas <robertmhaas@gmail.com> escreveu:
On Fri, May 1, 2015 at 9:09 AM, Marko Tiikkaja <marko@joh.to> wrote:
> On 2015-04-28 19:43, Robert Haas wrote:
>> I guess
>> the root of the problem is that PL/plgsql's cache invalidation logic
>> only considers the pg_proc row's TID and xmin when deciding whether to
>> recompile. For base types that's probably OK, but for composite
>> types, not so much.
>>
>> Thoughts?
>
> We recently hit a similar case in our production environment. What was
> annoying about it is that there didn't seem to be a way for the application
> to fix the issue by itself, short of reconnecting; even DISCARD ALL doesn't
> help. If we can't fix the underlying issue, can we at least provide a way
> for apps to invalidate these caches themselves, for example in the form of a
> DISCARD option?
It's been discussed before and I am in favor of it. However the
implementation is a bit challenging. The DISCARD command doesn't know
what PLs may have decided to cache, nonwithstanding the fact that they
all cache basically the same stuff using basically the same method. I
think the PL interface will need to be extended in some way to support
a new callback.
IMHO we need a way to DISCARD run a cleanup code for each installed extension. Maybe with a new option like DISCARD EXTENSIONS. So each extension could have and register your own cleanup code.
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, May 1, 2015 at 9:09 AM, Marko Tiikkaja <marko@joh.to> wrote: >> We recently hit a similar case in our production environment. What was >> annoying about it is that there didn't seem to be a way for the application >> to fix the issue by itself, short of reconnecting; even DISCARD ALL doesn't >> help. If we can't fix the underlying issue, can we at least provide a way >> for apps to invalidate these caches themselves, for example in the form of a >> DISCARD option? > It's been discussed before and I am in favor of it. I'm not. We should fix the problem not expect applications to band-aid around it. This would be particularly ill-advised because there are so many applications that just blindly do DISCARD ALL when changing contexts. Having said that, I'm not sure that it's easy to get to a solution :-( regards, tom lane
On Fri, May 1, 2015 at 12:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, May 1, 2015 at 9:09 AM, Marko Tiikkaja <marko@joh.to> wrote: >>> We recently hit a similar case in our production environment. What was >>> annoying about it is that there didn't seem to be a way for the application >>> to fix the issue by itself, short of reconnecting; even DISCARD ALL doesn't >>> help. If we can't fix the underlying issue, can we at least provide a way >>> for apps to invalidate these caches themselves, for example in the form of a >>> DISCARD option? > >> It's been discussed before and I am in favor of it. > > I'm not. We should fix the problem not expect applications to band-aid > around it. This would be particularly ill-advised because there are so > many applications that just blindly do DISCARD ALL when changing contexts. The most common real world manifestation of this problem (having to DISCARD to invalidate plans) is when using schema partitioned data with pl/pgsql functions. Attempting to hide everything under DISCARD is trading one problem for another: it's going to be hard for users of tools like pgbouncer (the main users of DISCARD) to correctly judge the performance trade-offs and this statement's workings is already pretty arcane. merlin