Re: UNNEST with multiple args, and TABLE with multiple funcs - Mailing list pgsql-hackers
From | Boszormenyi Zoltan |
---|---|
Subject | Re: UNNEST with multiple args, and TABLE with multiple funcs |
Date | |
Msg-id | 52124690.8050309@cybertec.at Whole thread Raw |
In response to | UNNEST with multiple args, and TABLE with multiple funcs (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Responses |
Re: UNNEST with multiple args, and TABLE with multiple funcs
|
List | pgsql-hackers |
Hi, 2013-08-13 15:54 keltezéssel, Andrew Gierth írta: > Summary: > > This patch implements a method for expanding multiple SRFs in parallel > that does not have the surprising LCM behaviour of SRFs-in-select-list. > (Functions returning fewer rows are padded with nulls instead.) > > It then uses this method combined with a parse-time hack to implement > the (intended to be) spec-conforming behaviour of UNNEST with multiple > parameters, including flattening of composite results. > > The upshot is that given a table like this: > > postgres=# select * from t1; > a | b | c > ---------------+-------------------+---------------------------------------------- > {11,12,13} | {wombat} | > {5,10} | {foo,bar} | {"(123,xyzzy)","(456,plugh)","(789,plover)"} > {21,31,41,51} | {fred,jim,sheila} | {"(111,xyzzy)","(222,plugh)"} > (3 rows) > > (where column "c" is an array of a composite type with 2 cols, "x" and "y") > > You can do this: > > postgres=# select u.* from t1, unnest(a,b,c) with ordinality as u; > ?column? | ?column? | x | y | ordinality > ----------+----------+-----+--------+------------ > 11 | wombat | | | 1 > 12 | | | | 2 > 13 | | | | 3 > 5 | foo | 123 | xyzzy | 1 > 10 | bar | 456 | plugh | 2 > | | 789 | plover | 3 > 21 | fred | 111 | xyzzy | 1 > 31 | jim | 222 | plugh | 2 > 41 | sheila | | | 3 > 51 | | | | 4 > (10 rows) > > Or for an example of general combination of functions: > > postgres=# select * from table(generate_series(10,20,5), unnest(array['fred','jim'])); > ?column? | ?column? > ----------+---------- > 10 | fred > 15 | jim > 20 | > (3 rows) > > Implementation Details: > > The spec syntax for table function calls, <table function derived table> > in <table reference>, looks like TABLE(func(args...)) AS ... > > This patch implements that, plus an extension: it allows multiple > functions, TABLE(func1(...), func2(...), func3(...)) [WITH ORDINALITY] > and defines this as meaning that the functions are to be evaluated in > parallel. > > This is implemented by changing RangeFunction, function RTEs, and the > FunctionScan node to take lists of function calls rather than a single > function. The calling convention for SRFs is completely unchanged; each > function returns its own rows (or a tuplestore in materialize mode) just > as before, and FunctionScan combines the results into a single output > tuple (keeping track of which functions are exhausted in order to > correctly fill in nulls on a backwards scan). > > Then, a hack in the parser converts unnest(...) appearing as a > func_table (and only there) into a list of unnest() calls, one for each > parameter. So > > select ... from unnest(a,b,c) > > is converted to > > select ... from TABLE(unnest(a),unnest(b),unnest(c)) > > and if unnest appears as part of an existing list inside TABLE(), it's > expanded to multiple entries there too. > > This parser hackery is of course somewhat ugly. But given the objective > of implementing the spec's unnest syntax, it seems to be the least ugly > of the possible approaches. (The hard part of doing it any other way > would be generating the description of the result type; composite array > parameters expand into multiple result columns.) Harder maybe but it may still be cleaner in the long run. > Overall, it's my intention here to remove as many as feasible of the old > reasons why one might use an SRF in the select list. Indeed, it's a big nail in the coffin for SRFs-in-targetlist. Having WITH ORDINALITY and this feature, I would vote for removing SRF-in-targetlist and call the release PostgreSQL 10.0. > This should also > address the points that Josh brought up in discussion of ORDINALITY > regarding use of SRF-in-select to unnest multiple arrays. > > (As a side issue, this patch also sets up pathkeys for ordinality along > the lines of a patch I suggested to Greg a while back in response to > his.) > > Current patch status: > > This is a first working cut: no docs, no tests, not enough comments, the > deparse logic probably needs more work (it deparses correctly but the > formatting may be suboptimal). However all the functionality is believed > to be in place. With this last paragraph in mind, I am trying a little review. * Is the patch in a patch format which has context? (eg: context diff format) Yes. * Does it apply cleanly to the current git master? Applies with some offsets on a few files but without fuzz. * Does it include reasonable tests, necessary doc patches, etc? No, as told by the patch author. * Does the patch actually implement what it's supposed to do? Yes. * Do we want that? Yes. * Do we already have it? No. * Does it follow SQL spec, or the community-agreed behavior? The SQL spec says these: In 7.6 <table reference> <table primary> ::= ... | <table function derived table> [ AS ] <correlation name> [ <left paren> <derived column list> <right paren> ] also later in the same section: <table function derived table> ::= TABLE <left paren> <collection value expression> <right paren> In 6.26 <value expression> <collection value expression> ::= <array value expression> | <multiset value expression> In 6.36 <array value expression> <array value expression> ::= <array concatenation> | <array primary> <array concatenation> ::= <array value expression 1> <concatenation operator> <array primary> <array value expression 1> ::= <array value expression> <array primary> ::= <array value function> | <value expression primary> 6.3 <value expression primary> <value expression primary> ::= <parenthesized value expression> | <nonparenthesized value expression primary> <parenthesized value expression> ::= <left paren> <value expression> <right paren> <nonparenthesized value expression primary> ::= <unsigned value specification> | <column reference> | <set function specification> | <window function> | <nested window function> | <scalar subquery> | <case expression> | <cast specification> | <field reference> | <subtype treatment> | <methodinvocation> | <static method invocation> | <new specification> | <attribute or method reference> | <reference resolution> | <collection value constructor> | <array element reference> | <multiset element reference> | <next value expression> | <routine invocation> collection value constructor> ::= | <array value constructor> | <multiset value constructor> So, the FROM TABLE(...) AS (...) syntax is a big can of worms and I haven't even quoted <multiset value expression>. As far as I can tell, these should also be allowed but isn't: zozo=# select * from table('a'::text) as x; ERROR: syntax error at or near "'a'" LINE 1: select * from table('a'::text) as x; ^ zozo=# select x.* from t1, table(t1.a) as x; ERROR: syntax error at or near ")" LINE 1: select x.* from t1, table(t1.a) as x; ^ zozo=# select x.* from table((6)) as x(a int4); ERROR: syntax error at or near "(" LINE 1: select x.* from table((6)) as x(a int4); ^ zozo=# select x.* from table(values (6)) as x(a int4); ERROR: syntax error at or near "(" LINE 1: select x.* from table(values (6)) as x(a int4); ^ zozo=# select x.* from table(values(6)) as x(a int4); ERROR: syntax error at or near "(" LINE 1: select x.* from table(values(6)) as x(a int4); ^ What the patch implements is only the last choice for <nonparenthesized value expression primary>: <routine invocation> When you add documentation, it would be nice to mention it. Also, the grammar extension is a start for adding all the other standard choices for TABLE(). * Does it include pg_dump support (if applicable)? n/a * Are there dangers? I can't see any. 8-) * Have all the bases been covered? My previous comments about the TABLE() syntax says it all. You can interpret it either way. :-) * Does the feature work as advertised? Yes. * Are there corner cases the author has failed to consider? I don't know. * Are there any assertion failures or crashes? No. * Does the patch slow down simple tests? No. * If it claims to improve performance, does it? It certainly improves writing queries, as functions inside unnest() get processed in one scan. * Does it slow down other things? I don't think so. * Does it follow the project coding guidelines? Yes. * Are there portability issues? No. * Will it work on Windows/BSD etc? It should, the code uses standard internal PostgreSQL APIs and extends them. No new system call. * Are the comments sufficient and accurate? According to the author, no. * Does it do what it says, correctly? Yes. * Does it produce compiler warnings? No. * Can you make it crash? No. * Is everything done in a way that fits together coherently with other features/modules? I think so * Are there interdependencies that can cause problems? I don't know. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
pgsql-hackers by date: