Thread: Invoking SQL function while doing CREATE OR REPLACE on it

Invoking SQL function while doing CREATE OR REPLACE on it

From
"Nagendra Mahesh (namahesh)"
Date:
I have a Postgres 14.4 cluster (AWS Aurora) to which I connect from my application using JDBC.

I use liquibase for schema management - not only tables, but also a bunch of SQL stored procedures and functions.
Basically,there is one liquibase changeSet that runs last and executes a set of SQL files which contain stored
proceduresand functions. 

CREATE OR REPLACE FUNCTION bar(arg1 text, arg2 text) RETURNS record LANGUAGE "plpgsql" AS '
BEGIN
   // function body
END;
';

These functions / procedures are replaced ONLY when there is a change in one / more SQL files which are part of this
changeSet.(runOnChange: true). 

Whenever I do a rolling deployment of my application (say, with a change in the function body of bar()), liquibase will
executethe CREATE OR REPLACE FUNCTION bar() as part of a transaction. 

In the few milliseconds while bar() is being replaced, there are other ongoing transactions (from other replicas of my
application)which are continuously trying to invoke bar(). 

Only in this tiny time window, few transactions fail with the following error:

ERROR: function bar(arg1 => text, arg2 => text) does not exist
  Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Position: 4 : errorCode = 42883

I don't want any of these transactions to fail (we do not have any proper way of re-trying them from the application
layer).
This is seen as affecting availability.
However, it is acceptable for these transactions to BLOCK (for a few hundred ms) while the SQL function body is being
replaced,and then proceed with invocation. 

Is there a way to safely modify a stored function / procedure in PostgreSQL while that function / procedure is being
invokedcontinuously by multiple transactions? 

Thanks!


Re: Invoking SQL function while doing CREATE OR REPLACE on it

From
Erik Wienhold
Date:
> On 03/05/2023 20:17 CEST Nagendra Mahesh (namahesh) <namahesh@cisco.com> wrote:
>
> I have a Postgres 14.4 cluster (AWS Aurora) to which I connect from my
> application using JDBC.
>
> I use liquibase for schema management - not only tables, but also a bunch of
> SQL stored procedures and functions. Basically, there is one liquibase
> changeSet that runs last and executes a set of SQL files which contain stored
> procedures and functions.
>
> CREATE OR REPLACE FUNCTION bar(arg1 text, arg2 text) RETURNS record LANGUAGE "plpgsql" AS '
> BEGIN
>    // function body
> END;
> ';
>
> These functions / procedures are replaced ONLY when there is a change in one /
> more SQL files which are part of this changeSet. (runOnChange: true).
>
> Whenever I do a rolling deployment of my application (say, with a change in
> the function body of bar()), liquibase will execute the CREATE OR REPLACE FUNCTION bar()
> as part of a transaction.
>
> In the few milliseconds while bar() is being replaced, there are other ongoing
> transactions (from other replicas of my application) which are continuously
> trying to invoke bar().
>
> Only in this tiny time window, few transactions fail with the following error:
>
> ERROR: function bar(arg1 => text, arg2 => text) does not exist
>   Hint: No function matches the given name and argument types. You might need to add explicit type casts.
> Position: 4 : errorCode = 42883

CREATE OR REPLACE FUNCTION should be atomic and cannot change the function
signature.  I don't see how a function cannot exist at some point in this case.

Are you sure that Liquibase is not dropping the function before re-creating it?
If Liquibase drops and re-creates the function in separate transactions, the
transactions trying to execute that function may find it dropped when using the
read committed isolation level.

There's also a race condition bug in v14.4 that may be relevant.  It got fixed
in v14.5.  See "Fix race condition when checking transaction visibility" in
https://www.postgresql.org/docs/14/release-14-5.html.

--
Erik



Re: Invoking SQL function while doing CREATE OR REPLACE on it

From
Tom Lane
Date:
Erik Wienhold <ewie@ewie.name> writes:
> On 03/05/2023 20:17 CEST Nagendra Mahesh (namahesh) <namahesh@cisco.com> wrote:
>> I have a Postgres 14.4 cluster (AWS Aurora) to which I connect from my
>> application using JDBC.

Aurora uses, I believe, some other storage engine entirely than
community Postgres has.

>> Only in this tiny time window, few transactions fail with the following error:
>> ERROR: function bar(arg1 => text, arg2 => text) does not exist
>> Hint: No function matches the given name and argument types. You might need to add explicit type casts.

> There's also a race condition bug in v14.4 that may be relevant.  It got fixed
> in v14.5.  See "Fix race condition when checking transaction visibility" in
> https://www.postgresql.org/docs/14/release-14-5.html.

That race could easily explain this symptom: during the update, there
are two versions of the function's pg_proc row, and it could be that
both of them appear "dead" to an onlooker transaction if one of them
is inspected during the race window.  Then the onlooker would find
no live version of the row and report that it doesn't exist.

But having said that, it's not clear to me whether Aurora's storage
engine shares this bug with community PG, or you're seeing some
independent bug of theirs that happens to have a similar symptom.
It's even less clear whether AWS would have applied the fix yet if it
is a shared bug.  You really need to discuss this with AWS support.

            regards, tom lane



Re: Invoking SQL function while doing CREATE OR REPLACE on it

From
"Nagendra Mahesh (namahesh)"
Date:
Thanks for the insight, Erik.
Turns out, it was indeed liquibase which was:
- dropping the functions in one transaction (inadvertently as the result of a DROP TYPE CASCADE)
- immediately followed by the next transaction which did a CREATE OR REPLACE FUNCTION which recreated these functions

The tiny gap in time after the first transaction commit, before the second transaction could commit was when the
functionactually ceased to exist. 
So, the invocations during that time failed.

Thanks again.

From: Erik Wienhold <ewie@ewie.name>
Sent: Thursday, May 4, 2023 2:33 AM
To: Nagendra Mahesh (namahesh) <namahesh@cisco.com>; pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Subject: Re: Invoking SQL function while doing CREATE OR REPLACE on it
 
> On 03/05/2023 20:17 CEST Nagendra Mahesh (namahesh) <namahesh@cisco.com> wrote:
>
> I have a Postgres 14.4 cluster (AWS Aurora) to which I connect from my
> application using JDBC.
>
> I use liquibase for schema management - not only tables, but also a bunch of
> SQL stored procedures and functions. Basically, there is one liquibase
> changeSet that runs last and executes a set of SQL files which contain stored
> procedures and functions.
>
> CREATE OR REPLACE FUNCTION bar(arg1 text, arg2 text) RETURNS record LANGUAGE "plpgsql" AS '
> BEGIN
>    // function body
> END;
> ';
>
> These functions / procedures are replaced ONLY when there is a change in one /
> more SQL files which are part of this changeSet. (runOnChange: true).
>
> Whenever I do a rolling deployment of my application (say, with a change in
> the function body of bar()), liquibase will execute the CREATE OR REPLACE FUNCTION bar()
> as part of a transaction.
>
> In the few milliseconds while bar() is being replaced, there are other ongoing
> transactions (from other replicas of my application) which are continuously
> trying to invoke bar().
>
> Only in this tiny time window, few transactions fail with the following error:
>
> ERROR: function bar(arg1 => text, arg2 => text) does not exist
>   Hint: No function matches the given name and argument types. You might need to add explicit type casts.
> Position: 4 : errorCode = 42883

CREATE OR REPLACE FUNCTION should be atomic and cannot change the function
signature.  I don't see how a function cannot exist at some point in this case.

Are you sure that Liquibase is not dropping the function before re-creating it?
If Liquibase drops and re-creates the function in separate transactions, the
transactions trying to execute that function may find it dropped when using the
read committed isolation level.

There's also a race condition bug in v14.4 that may be relevant.  It got fixed
in v14.5.  See "Fix race condition when checking transaction visibility" in
https://www.postgresql.org/docs/14/release-14-5.html.

--
Erik