Thread: Converting plpgsql to use DTYPE_REC for named composite types
Those with long memories will recall that for some time now I've been arguing that plpgsql should be changed to treat all composite-type variables (both "record" and named composite types) via its DTYPE_REC code paths, instead of the current situation where it handles variables of named composite types via its DTYPE_ROW code paths. DTYPE_ROW is great for what it was meant for, which is to allow multiple target variables in usages like "SELECT ... INTO a, b, c" to be represented by a single PLpgSQL_datum. It's not great for composite-type variables. There are two really fundamental problems with doing things that way: * DTYPE_ROW cannot represent a simple NULL value of the composite datum; the closest it can manage is to set the component variables to nulls, which is equivalent to ROW(NULL, NULL, ...), which is not the same thing as a simple NULL. We've had complaints about this before, eg: https://www.postgresql.org/message-id/flat/52A9010D.3070202%40ultimeth.com https://www.postgresql.org/message-id/flat/20120920210519.241290%40gmx.com * If the composite type changes, say by adding or renaming a column, plpgsql cannot hope to cope with that without fully recompiling the function, since its symbol table (list of PLpgSQL_datums) would have to change. Currently it doesn't even try. We've had complaints about that too: https://www.postgresql.org/message-id/flat/CAL870DWGgkr7W6ZW%3DjGeqE4bHi0E%3DTLwjcQx95C9pYAVL3%3DU%3DQ%40mail.gmail.com https://www.postgresql.org/message-id/flat/CA%2BTgmoYDf7dkXhKtk7u_YnAfSt47SgK5J8gD8C1JfSiouU194g%40mail.gmail.com https://www.postgresql.org/message-id/flat/CAFj8pRC8_-Vppe_sx7%2Bjn-4UQ_YVXGdhWP5O0rtmv6qZxShmFg%40mail.gmail.com A slightly lesser problem is that we've been discouraged from adding features like allowing composite-typed variables to be given initializers or marked CONSTANT because we'd have to fix two completely different code paths, doubling the work needed. This also applies to the issue of fixing plpgsql to support domains over composites correctly, which is what got me interested in doing something about it now. Hence, attached is a proposed patch to convert plpgsql to use DTYPE_REC for all user-declared composite-typed variables. DTYPE_ROW remains, but is used only for the purpose of collecting lists of target variables. Since the main objection that's been raised to this change in the past is possible performance loss in some cases, I've gone to considerable trouble to try to minimize such losses. Work remains to be done for most of the feature issues mentioned above, though this does fix the issue of allowing composite-typed variables to be simple NULLs. (I did yield to the temptation to allow plpgsql functions to take arguments declared as just "record", since there seems no good reason why that's still disallowed.) I believe I've gotten things to the point where this is acceptable from a performance standpoint. Testing various microbenchmarks on variables declared as named composites, some things are faster and some are slower, but nothing seems to get more than circa 5% worse. The same benchmarks on variables declared as "record" are uniformly improvements from HEAD, by significant margins ranging up to 2X faster. The worst issues I've noted are with trivial trigger functions, where there's a noticeable increase in startup overhead. I have some basically-independent patches that can buy that back, but since this patch is more than large enough, I'll post those as a separate thread. The core idea of the patch is to introduce an implementation of composite values as "expanded objects", extending the infrastructure that we used to improve performance of plpgsql array variables in commit 1dc5ebc90 and follow-on patches. So far, only plpgsql has been taught about such objects --- later it might be interesting to extend some of the core operations such as FieldSelect to deal with them explicitly. My plan is that expandedrecord.c will grow the ability to store values of domains-over-composite, including the ability to apply domain constraint checks during assignments. That's not there yet, though some comments make reference to it, and a few bits of code are ready for it. Also, this expands the idea I had in commit 687f096ea to get the typcache to assign unique-within-a-process numbers to different versions of a composite type, so that dependent code can easily recognize that a change has happened. Now, the numbers are unique within a process across all composite types, rather than being just unique per type. Since they're 64-bit counters there seems no risk of wraparound within the lifespan of a backend process. I added a bunch of new regression test cases. Those are mainly meant to ensure adequate test coverage of the new code. Running those cases on HEAD shows no behavioral changes except the expected ones around handling of composite NULLs and the addition of RECORD as an allowed argument type. I'll stick this into the January commitfest, but I'd like to get it reviewed and committed pretty soon, because there are follow-on patches that need to get done in time for v11 --- in particular, we need to close out the lack of plpgsql support for domains-over-composite. regards, tom lane
Attachment
Hi
I'll stick this into the January commitfest, but I'd like to get it
reviewed and committed pretty soon, because there are follow-on patches
that need to get done in time for v11 --- in particular, we need to close
out the lack of plpgsql support for domains-over-composite.
I didn't checked code - just I did some performance tests and I am thinking so performance is very good.
Master's record type has 50% speed of row type in my test. Patched has +/- same speed.
I see very small slowdown for row type .. about 3% but I think so it is acceptable - I tested some worst case.
Unfortunately - it breaks and very breaks all plpgsql related extensions - pldebug, plprofiler, plpgsql_check. On second hand, there are only few extensions of this kind.
Regards
Pavel
regards, tom lane
Hi
2017-12-29 9:56 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
HiI'll stick this into the January commitfest, but I'd like to get it
reviewed and committed pretty soon, because there are follow-on patches
that need to get done in time for v11 --- in particular, we need to close
out the lack of plpgsql support for domains-over-composite.I didn't checked code - just I did some performance tests and I am thinking so performance is very good.Master's record type has 50% speed of row type in my test. Patched has +/- same speed.I see very small slowdown for row type .. about 3% but I think so it is acceptable - I tested some worst case.Unfortunately - it breaks and very breaks all plpgsql related extensions - pldebug, plprofiler, plpgsql_check. On second hand, there are only few extensions of this kind.
I checked the code:
Interesting part from test:
alter table mutable drop column f1;
alter table mutable add column f1 float8;
-- currently, this fails due to cached plan for "r.f1 + 1" expression
select sillyaddone(42);
ERROR: type of parameter 4 (double precision) does not match that when preparing the plan (integer)
CONTEXT: PL/pgSQL function sillyaddone(integer) line 1 at RETURN
alter table mutable add column f1 float8;
-- currently, this fails due to cached plan for "r.f1 + 1" expression
select sillyaddone(42);
ERROR: type of parameter 4 (double precision) does not match that when preparing the plan (integer)
CONTEXT: PL/pgSQL function sillyaddone(integer) line 1 at RETURN
In this case, can we invalidate plan cache? It can decrease a risk of runtime issues when tables are altered.
Because PLPGSQL_NSTYPE_ROW is removed, then "switch" statement is maybe useless
if (ns != NULL && nnames == 2)
{
switch (ns->itemtype)
{
case PLPGSQL_NSTYPE_REC:
{
/*
* words 1/2 are a record name, so third word could be
* a field in this record.
*/
PLpgSQL_rec *rec;
PLpgSQL_recfield *new;
rec = (PLpgSQL_rec *) (plpgsql_Datums[ns->itemno]);
new = plpgsql_build_recfield(rec, word3);
wdatum->datum = (PLpgSQL_datum *) new;
wdatum->ident = NULL;
wdatum->quoted = false; /* not used */
wdatum->idents = idents;
return true;
}
default:
break;
}
}
}
{
switch (ns->itemtype)
{
case PLPGSQL_NSTYPE_REC:
{
/*
* words 1/2 are a record name, so third word could be
* a field in this record.
*/
PLpgSQL_rec *rec;
PLpgSQL_recfield *new;
rec = (PLpgSQL_rec *) (plpgsql_Datums[ns->itemno]);
new = plpgsql_build_recfield(rec, word3);
wdatum->datum = (PLpgSQL_datum *) new;
wdatum->ident = NULL;
wdatum->quoted = false; /* not used */
wdatum->idents = idents;
return true;
}
default:
break;
}
}
}
should be reduced
if (ns != NULL && nnames == 2 && ns->itemtype == PLPGSQL_NSTYPE_REC)
{
/*
* words 1/2 are a record name, so third word could be
* a field in this record.
*/
PLpgSQL_rec *rec;
PLpgSQL_recfield *new;
rec = (PLpgSQL_rec *) (plpgsql_Datums[ns->itemno]);
new = plpgsql_build_recfield(rec, word3);
wdatum->datum = (PLpgSQL_datum *) new;
wdatum->ident = NULL;
wdatum->quoted = false; /* not used */
wdatum->idents = idents;
return true;
}
{
/*
* words 1/2 are a record name, so third word could be
* a field in this record.
*/
PLpgSQL_rec *rec;
PLpgSQL_recfield *new;
rec = (PLpgSQL_rec *) (plpgsql_Datums[ns->itemno]);
new = plpgsql_build_recfield(rec, word3);
wdatum->datum = (PLpgSQL_datum *) new;
wdatum->ident = NULL;
wdatum->quoted = false; /* not used */
wdatum->idents = idents;
return true;
}
why is in exec_assign_value still case for PLPGSQL_DTYPE_ROW ?
Regards
Pavel
RegardsPavelregards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: > Interesting part from test: > alter table mutable drop column f1; > alter table mutable add column f1 float8; > -- currently, this fails due to cached plan for "r.f1 + 1" expression > select sillyaddone(42); > ERROR: type of parameter 4 (double precision) does not match that when > preparing the plan (integer) > CONTEXT: PL/pgSQL function sillyaddone(integer) line 1 at RETURN > In this case, can we invalidate plan cache? That's something I expect we can improve in followon patches, but it seems a bit outside the scope of this patch (which is mighty big already). > Because PLPGSQL_NSTYPE_ROW is removed, then "switch" statement is maybe > useless I'd just as soon leave it as a switch, for possible future expansion. > why is in exec_assign_value still case for PLPGSQL_DTYPE_ROW ? Take it out and see ;-). The whole point of having DTYPE_ROW at all is to support stuff like "SELECT ... INTO a,b,c". regards, tom lane
2017-12-29 18:38 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> Interesting part from test:
> alter table mutable drop column f1;
> alter table mutable add column f1 float8;
> -- currently, this fails due to cached plan for "r.f1 + 1" expression
> select sillyaddone(42);
> ERROR: type of parameter 4 (double precision) does not match that when
> preparing the plan (integer)
> CONTEXT: PL/pgSQL function sillyaddone(integer) line 1 at RETURN
> In this case, can we invalidate plan cache?
That's something I expect we can improve in followon patches, but
it seems a bit outside the scope of this patch (which is mighty
big already).
ok
> Because PLPGSQL_NSTYPE_ROW is removed, then "switch" statement is maybe
> useless
I'd just as soon leave it as a switch, for possible future expansion.
I don't think so there will be some cases - but it is not a issue
> why is in exec_assign_value still case for PLPGSQL_DTYPE_ROW ?
Take it out and see ;-). The whole point of having DTYPE_ROW
at all is to support stuff like "SELECT ... INTO a,b,c".
ok
Regards
Pavel
regards, tom lane
I wrote: > I'll stick this into the January commitfest, but I'd like to get it > reviewed and committed pretty soon, because there are follow-on patches > that need to get done in time for v11 --- in particular, we need to close > out the lack of plpgsql support for domains-over-composite. I hacked on the domain-support problem a bit and it worked out well, so attached is a revised patch that incorporates that. This caused me to revise some assumptions about what expandedrecord.c's internal invariants ought to be, so it's probably better to look at this as a new patch rather than a diff from v1. Mostly this is changes internal to expandedrecord.c, but I cleaned up some code in plpgsql that tests for domain-ness, and I also added support in ExecEvalFieldSelect for extracting a field directly from an expanded record without flattening it into a tuple first. It hadn't been clear that that was worth the trouble before, but it definitely does come up while applying domain constraints. For example, having that fast path makes about a 2X speed difference in a test like this: create type two_ints as (f1 int, f2 int); create domain ordered_ints as two_ints check((value).f1 <= (value).f2); \timing on do $$ declare d ordered_ints; begin for i in 1..3000000 loop d.f2 := i; d.f1 := i; end loop; end$$; There are still a couple of soft spots having to do with enforcing domain constraints against null composite values, e.g. if there's a constraint that would reject a null value we don't notice that at DECLARE time. I think there's not much point in being strict about that until we have support for initializers for composite variables. Which is definitely worth doing but it seems like it could be a separate patch. The patches in <11986.1514407114@sss.pgh.pa.us> still apply over this with just some line-number offsets, so I won't post a new version of those. regards, tom lane
Attachment
I wrote: > I hacked on the domain-support problem a bit and it worked out well, > so attached is a revised patch that incorporates that. This caused > me to revise some assumptions about what expandedrecord.c's internal > invariants ought to be, so it's probably better to look at this as > a new patch rather than a diff from v1. Oh, I'd not looked at the documentation angle of this. Here's a short add-on patch which just adds the fact that "record" is now a valid argument type, and removes a no-longer-correct claim that system columns are always inaccessible in row variables. The documentation draws a distinction between "record" and "row" variables which is now rather artificial so far as the code is concerned. But it's not totally pointless either, since it's still true that a variable declared "record" will mutate its type in a way that a named-composite-type variable will not. I'm inclined to leave that text as is; anyone think differently? regards, tom lane diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index 7d23ed4..86d28fb 100644 *** a/doc/src/sgml/plpgsql.sgml --- b/doc/src/sgml/plpgsql.sgml *************** *** 123,129 **** and they can return a result of any of these types. They can also accept or return any composite type (row type) specified by name. It is also possible to declare a <application>PL/pgSQL</application> ! function as returning <type>record</type>, which means that the result is a row type whose columns are determined by specification in the calling query, as discussed in <xref linkend="queries-tablefunctions"/>. </para> --- 123,131 ---- and they can return a result of any of these types. They can also accept or return any composite type (row type) specified by name. It is also possible to declare a <application>PL/pgSQL</application> ! function as accepting <type>record</type>, which means that any ! composite type will do as input, or ! as returning <type>record</type>, which means that the result is a row type whose columns are determined by specification in the calling query, as discussed in <xref linkend="queries-tablefunctions"/>. </para> *************** user_id users.user_id%TYPE; *** 672,685 **** </para> <para> - Only the user-defined columns of a table row are accessible in a - row-type variable, not the OID or other system columns (because the - row could be from a view). The fields of the row type inherit the - table's field size or precision for data types such as - <type>char(<replaceable>n</replaceable>)</type>. - </para> - - <para> Here is an example of using composite types. <structname>table1</structname> and <structname>table2</structname> are existing tables having at least the mentioned fields: --- 674,679 ----
2017-12-30 0:16 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
I wrote:
> I'll stick this into the January commitfest, but I'd like to get it
> reviewed and committed pretty soon, because there are follow-on patches
> that need to get done in time for v11 --- in particular, we need to close
> out the lack of plpgsql support for domains-over-composite.
I hacked on the domain-support problem a bit and it worked out well,
so attached is a revised patch that incorporates that. This caused
me to revise some assumptions about what expandedrecord.c's internal
invariants ought to be, so it's probably better to look at this as
a new patch rather than a diff from v1.
Mostly this is changes internal to expandedrecord.c, but I cleaned up
some code in plpgsql that tests for domain-ness, and I also added support
in ExecEvalFieldSelect for extracting a field directly from an expanded
record without flattening it into a tuple first. It hadn't been clear
that that was worth the trouble before, but it definitely does come up
while applying domain constraints. For example, having that fast path
makes about a 2X speed difference in a test like this:
create type two_ints as (f1 int, f2 int);
create domain ordered_ints as two_ints check((value).f1 <= (value).f2);
\timing on
do $$
declare d ordered_ints;
begin
for i in 1..3000000 loop
d.f2 := i;
d.f1 := i;
end loop;
end$$;
There are still a couple of soft spots having to do with enforcing
domain constraints against null composite values, e.g. if there's
a constraint that would reject a null value we don't notice that
at DECLARE time. I think there's not much point in being strict
about that until we have support for initializers for composite
variables. Which is definitely worth doing but it seems like it
could be a separate patch.
The patches in <11986.1514407114@sss.pgh.pa.us > still apply over this
with just some line-number offsets, so I won't post a new version
of those.
all test passed on my comp too.
I think so these patches can be useful for schema variables too.
Regards
Pavel
regards, tom lane
Here's a version of this rebased up to HEAD, fixing a couple of trivial merge conflicts and incorporating the docs delta I posted separately. (I'd supposed this patch was still OK because the patch tester said so, but I now see that the tester was only testing the docs delta :-(.) regards, tom lane
Attachment
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2017-12-30 0:16 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>: >>> I'll stick this into the January commitfest, but I'd like to get it >>> reviewed and committed pretty soon, because there are follow-on patches >>> that need to get done in time for v11 --- in particular, we need to close >>> out the lack of plpgsql support for domains-over-composite. > all test passed on my comp too. Is anyone planning on doing further review of this patch than Pavel already did? If not, I'd like to go ahead and push it, since there's still followup stuff that I'd like to address for v11. As noted upthread, there are places where plpgsql fails to enforce DOMAIN NOT NULL constraints, and I think I had some other squishy things in my notes. regards, tom lane