Thread: best practice for || set of rows --> function --> set of rows
Hello Dear List,
this is a re-post with a more clear title and rewrite.
I have a design issue :this is a re-post with a more clear title and rewrite.
What is best practice for input?
_give table name as input || so no order unless I use view, doesn't work with CTE and all.
_give array and use unnest/arrayagg || bad performance with big input
_give refcursor || non-convenient because of transaction and require 2 separate queries.
Is there another way?
I would like to use it in a single sql statement, which leaves only array, which will perform bad (I may have hundreds of k of rows as input).
Also the function need all the row in input to work, and can't operate row by row.
It would be like an aggregate, but returning several rows.
Of course I read all the doc I could find, but doc doesn't give best practice !
Thank you very much for helping, I am in a stalemate now, and can't progress further.
Below is the original message, giving details over what the function do.
Cheers,
Cheers,
Rémi-C
I wrote a plpgsql function to compute union of time range that works :
[1,4]U[3,8]U[12,14]U[16,18] ---> [1,8]U[12,14]U[16,18]
It works on multiple rows.
My issue is a design issue :
I want to work on set of row and return set of row.
I am aware I could take as input/output array of range but I don't want
(memory/casting cost).
Currently the function takes a cursor on a table and output a setof record.
I would like that the function can blend in multiple subqueries smoothly, as
WITH (
first query to get range),
(query computing union
),
(query using computed union
) etc.
Currently I have to execute 2 sql statment :
create cursor on ranges;
WITH (function to compute union)
,
(query...)
The only kind of function taking set of record as input I know of is
aggregate function, but it returns only one row and the output of union can
take multiple row.
Any insight would be greatly appreciated.
Cheers,
Rémi-C
On Tue, Sep 17, 2013 at 2:06 AM, Rémi Cura <remi.cura@gmail.com> wrote: > The only kind of function taking set of record as input I know of is > aggregate function, but it returns only one row and the output of union can > take multiple row. This may or may not help (I suggest posting a more complete example of what you are unable to do): If your output list of records is (quite) small, you may be able to get away with using an aggregate function. CREATE OR REPLACE append3 (anyarray, anyelement) RETURNS anyarray AS $$ SELECT CASE WHEN $1 IS NULL THEN ARRAY[$2] WHEN array_upper($1,1) >= 3 THEN $1 ELSE $1 || $2 END; $$ LANGUAGE SQL; CREATE AGGREGATE agg_append3(anyelement) (SFUNC = append3, STYPE = anyarray); CREATE TABLE foo(a int, b text); INSERT INTO foo SELECT s, s::text FROM generate_series(1,10) s; WITH data AS (SELECT unnest(agg_append3(f)) AS d FROM foo f) SELECT (d).* FROM data; a | b ---+--- 1 | 1 2 | 2 3 | 3 User defined aggregates can be defined over window function partitions: SELECT a, agg_append3(f) OVER (PARTITION BY a % 2 ORDER BY a DESC) AS d FROM foo f; a | d ----+----------------------------- 10 | {"(10,10)"} 8 | {"(10,10)","(8,8)"} 6 | {"(10,10)","(8,8)","(6,6)"} 4 | {"(10,10)","(8,8)","(6,6)"} 2 | {"(10,10)","(8,8)","(6,6)"} 9 | {"(9,9)"} 7 | {"(9,9)","(7,7)"} 5 | {"(9,9)","(7,7)","(5,5)"} 3 | {"(9,9)","(7,7)","(5,5)"} 1 | {"(9,9)","(7,7)","(5,5)"} merlin
remi.cura wrote > What is best practice for input? There is none; you have options because different scenarios require different solutions. > I see 3 solutions : > _give table name as input || so no order unless I use view, doesn't work > with CTE and all. > _give array and use unnest/arrayagg || bad performance with big input > _give refcursor || non-convenient because of transaction and require 2 > separate queries. To generalize: Input Types: -indirect (refcursor or table name) -direct (parameters w/ data - including arrays) Output Types: -Scalar -Set-Of -Indirect (refcursor, table name) Note that you can standardize on a table name and use the indirect output to communicate other information - like what subset of the table do you want to consider. One thought is to populate an input staging table using some kind of "transaction id"; run a function providing it the "transaction id" upon which it should operate; have said function populate an output table using the same transaction id. In effect you cache both the input and output data and then have your application query those caches (mainly the output cache) to obtain its results. Much more knowledge of the architecture in which the problem needs to operate, and the problem itself, is needed to make reasonable suggestions (as opposed to mere thought starters). I find the statement "non-convenient because of transaction and require 2 separate queries" to be utter nonsense at face value but again that stems from not knowing what limitations you are facing. You can embed "order" information into a table and I am unsure why it would not work with a CTE. I imagine something like: <not tested for syntax but the idea holds and is possible> WITH pop ( SELECT populate_table() AS trans_id ) , SELECT * FROM process_table ( SELECT trans_id FROM pop ) ; HTH David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/best-practice-for-set-of-rows-function-set-of-rows-tp5771189p5771265.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.