Thread: best practice for || set of rows --> function --> set of rows

best practice for || set of rows --> function --> set of rows

From

Rémi Cura

Date:

17 September 2013, 07:06:46

Hello Dear List,
this is a re-post with a more clear title and rewrite.

I have a design issue :

My function works on ordered set of rows and returns ordered set of rows (less thant input). I want to use it in standard sql query (in WITH for example).

What is best practice for input?

I see 3 solutions :
_give table name as input || so no order unless I use view, doesn't work with CTE and all.

_give array and use unnest/arrayagg || bad performance with big input

_give refcursor || non-convenient because of transaction and require 2 separate queries.

Is there another way?

I would like to use it in a single sql statement, which leaves only array, which will perform bad (I may have hundreds of k of rows as input).

Also the function need all the row in input to work, and can't operate row by row.

It would be like an aggregate, but returning several rows.

Of course I read all the doc I could find, but doc doesn't give best practice !

Thank you very much for helping, I am in a stalemate now, and can't progress further.

Below is the original message, giving details over what the function do.

Cheers,

Rémi-C

I wrote a plpgsql function to compute union of time range that works :

[1,4]U[3,8]U[12,14]U[16,18] ---> [1,8]U[12,14]U[16,18]

It works on multiple rows.

My issue is a design issue :
I want to work on set of row and return set of row.
I am aware I could take as input/output array of range but I don't want
(memory/casting cost).

Currently the function takes a cursor on a table and output a setof record.

I would like that the function can blend in multiple subqueries smoothly, as

WITH (
first query to get range),
(query computing union
),
(query using computed union
) etc.

Currently I have to execute 2 sql statment :

create cursor on ranges;

WITH (function to compute union)
,
(query...)

The only kind of function taking set of record as input I know of is
aggregate function, but it returns only one row and the output of union can
take multiple row.

Any insight would be greatly appreciated.

Cheers,

Rémi-C

Re: best practice for || set of rows --> function --> set of rows

From

Merlin Moncure

Date:

17 September 2013, 13:11:43

On Tue, Sep 17, 2013 at 2:06 AM, Rémi Cura <remi.cura@gmail.com> wrote:
> The only kind of function taking set of record as input I know of is
> aggregate function, but it returns only one row and the output of union can
> take multiple row.

This may or may not help (I suggest posting a more complete example of
what you are unable to do):

If your output list of records is (quite) small, you may be able to
get away with using an aggregate function.

CREATE OR REPLACE append3 (anyarray, anyelement) RETURNS anyarray AS
$$
  SELECT CASE
    WHEN $1 IS NULL THEN ARRAY[$2]
    WHEN array_upper($1,1) >= 3 THEN $1
    ELSE $1 || $2
  END;
$$ LANGUAGE SQL;

CREATE AGGREGATE agg_append3(anyelement) (SFUNC = append3, STYPE = anyarray);

CREATE TABLE foo(a int, b text);

INSERT INTO foo SELECT s, s::text FROM generate_series(1,10) s;

WITH data AS (SELECT unnest(agg_append3(f)) AS d FROM foo f) SELECT
(d).* FROM data;
 a | b
---+---
 1 | 1
 2 | 2
 3 | 3

User defined aggregates can be defined over window function partitions:

SELECT a, agg_append3(f) OVER (PARTITION BY a % 2 ORDER BY a DESC) AS
d FROM foo f;
 a  |              d
----+-----------------------------
 10 | {"(10,10)"}
  8 | {"(10,10)","(8,8)"}
  6 | {"(10,10)","(8,8)","(6,6)"}
  4 | {"(10,10)","(8,8)","(6,6)"}
  2 | {"(10,10)","(8,8)","(6,6)"}
  9 | {"(9,9)"}
  7 | {"(9,9)","(7,7)"}
  5 | {"(9,9)","(7,7)","(5,5)"}
  3 | {"(9,9)","(7,7)","(5,5)"}
  1 | {"(9,9)","(7,7)","(5,5)"}

merlin

Re: best practice for || set of rows --> function --> set of rows

From

David Johnston

Date:

17 September 2013, 14:50:50

remi.cura wrote
> What is best practice for input?

There is none; you have options because different scenarios require
different solutions.


> I see 3 solutions :
> _give table name as input || so no order unless I use view, doesn't work
> with CTE and all.
> _give array and use unnest/arrayagg || bad performance with big input
> _give refcursor || non-convenient because of transaction and require 2
> separate queries.

To generalize:

Input Types:
-indirect (refcursor or table name)
-direct (parameters w/ data - including arrays)

Output Types:
-Scalar
-Set-Of
-Indirect (refcursor, table name)

Note that you can standardize on a table name and use the indirect output to
communicate other information - like what subset of the table do you want to
consider.

One thought is to populate an input staging table using some kind of
"transaction id"; run a function providing it the "transaction id" upon
which it should operate; have said function populate an output table using
the same transaction id.  In effect you cache both the input and output data
and then have your application query those caches (mainly the output cache)
to obtain its results.

Much more knowledge of the architecture in which the problem needs to
operate, and the problem itself, is needed to make reasonable suggestions
(as opposed to mere thought starters).

I find the statement "non-convenient because of transaction and require 2
separate queries" to be utter nonsense at face value but again that stems
from not knowing what limitations you are facing.

You can embed "order" information into a table and I am unsure why it would
not work with a CTE.  I imagine something like:

<not tested for syntax but the idea holds and is possible>

WITH pop ( SELECT populate_table() AS trans_id )
, SELECT * FROM process_table ( SELECT trans_id FROM pop )
;

HTH

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/best-practice-for-set-of-rows-function-set-of-rows-tp5771189p5771265.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.