Thread: New function normal_rand_array function to contrib/tablefunc.

New function normal_rand_array function to contrib/tablefunc.

From
Andy Fan
Date:
Here is a new function which could produce an array of numbers with a
controllable array length and duplicated elements in these arrays. I
used it when working with gin index, and I think it would be helpful for
others as well, so here it is.

select * from normal_rand_array(5, 10, 1.8::numeric, 3.5::numeric);
               normal_rand_array               
-----------------------------------------------
 {3.3,2.3,2.7,3.2,2.0,2.7,3.4,2.7,2.3,2.9}
 {3.3,1.8,2.9,3.4,2.0,1.8,2.0,3.5,2.8,2.5}
 {2.1,1.9,2.3,1.9,2.5,2.7,2.4,2.9,1.8}
 {2.3,2.5,2.4,2.7,2.7,2.3,2.9,3.3,3.3,1.9,3.5}
 {2.8,3.4,2.7,1.8,3.3,2.3,2.2,3.5,2.6,2.5}
(5 rows)

select * from normal_rand_array(5, 10, 1.8::int4, 3.5::int4);
          normal_rand_array          
-------------------------------------
 {3,2,2,3,4,2}
 {2,4,2,3,3,3,3,2,2,3,3,2,3,2}
 {2,4,3}
 {4,2,3,4,2,4,2,2,3,4,3,3,2,4,4,2,3}
 {4,3,3,4,3,3,4,2,4}
(5 rows)

the 5 means it needs to produce 5 rows in total and the 10 is the
average array length, and 1.8 is the minvalue for the random function
and 3.5 is the maxvalue. 

-- 
Best Regards
Andy Fan


Attachment

Re: New function normal_rand_array function to contrib/tablefunc.

From
Stepan Neretin
Date:
It looks useful, for example, it can be used in sorting tests to make them more interesting. I just have one question.
Whyare you using SRF_IS_FIRST CALL and not _PG_init?
 
Best regards, Stepan Neretin.

Re: New function normal_rand_array function to contrib/tablefunc.

From
Jim Jones
Date:
Hi Andy

On 08.06.24 08:05, Andy Fan wrote:
> Here is a new function which could produce an array of numbers with a
> controllable array length and duplicated elements in these arrays. I
> used it when working with gin index, and I think it would be helpful for
> others as well, so here it is.
>
> select * from normal_rand_array(5, 10, 1.8::numeric, 3.5::numeric);
>                normal_rand_array               
> -----------------------------------------------
>  {3.3,2.3,2.7,3.2,2.0,2.7,3.4,2.7,2.3,2.9}
>  {3.3,1.8,2.9,3.4,2.0,1.8,2.0,3.5,2.8,2.5}
>  {2.1,1.9,2.3,1.9,2.5,2.7,2.4,2.9,1.8}
>  {2.3,2.5,2.4,2.7,2.7,2.3,2.9,3.3,3.3,1.9,3.5}
>  {2.8,3.4,2.7,1.8,3.3,2.3,2.2,3.5,2.6,2.5}
> (5 rows)
>
> select * from normal_rand_array(5, 10, 1.8::int4, 3.5::int4);
>           normal_rand_array          
> -------------------------------------
>  {3,2,2,3,4,2}
>  {2,4,2,3,3,3,3,2,2,3,3,2,3,2}
>  {2,4,3}
>  {4,2,3,4,2,4,2,2,3,4,3,3,2,4,4,2,3}
>  {4,3,3,4,3,3,4,2,4}
> (5 rows)
>
> the 5 means it needs to produce 5 rows in total and the 10 is the
> average array length, and 1.8 is the minvalue for the random function
> and 3.5 is the maxvalue. 
>

When either minval or maxval exceeds int4 the function cannot be
executed/found

SELECT * FROM normal_rand_array(5, 10, 8, 42::bigint);

ERROR:  function normal_rand_array(integer, integer, integer, bigint)
does not exist
LINE 1: SELECT * FROM normal_rand_array(5, 10, 8, 42::bigint);
                      ^
HINT:  No function matches the given name and argument types. You might
need to add explicit type casts.
---

SELECT * FROM normal_rand_array(5, 10, 8::bigint, 42);

ERROR:  function normal_rand_array(integer, integer, bigint, integer)
does not exist
LINE 1: SELECT * FROM normal_rand_array(5, 10, 8::bigint, 42);
                      ^
HINT:  No function matches the given name and argument types. You might
need to add explicit type casts.
---

However, when both are int8 it works fine:

SELECT * FROM normal_rand_array(5, 10, 8::bigint, 42::bigint);

                normal_rand_array                 
--------------------------------------------------
 {29,38,31,10,23,39,9,32}
 {8,39,19,31,29,15,17,15,36,20,33,19}
 {15,18,42,19}
 {16,31,33,11,14,20,24,9,12,17,22,42,41,24,11,41}
 {15,11,36,8,28,37}
(5 rows)
---

Is it the expected behaviour?

In some cases the function returns an empty array. Is it also expected?

SELECT count(*)
FROM normal_rand_array(100000, 10, 8, 42) i
WHERE array_length(i,1) IS NULL;

 count
-------
  4533
(1 row)


In both cases, perhaps mentioning these behaviors in the docs would
avoid some confusion.

Thanks!

Best,

-- 
Jim




Re: New function normal_rand_array function to contrib/tablefunc.

From
Dean Rasheed
Date:
On Tue, 2 Jul 2024 at 11:18, Jim Jones <jim.jones@uni-muenster.de> wrote:
>
> When either minval or maxval exceeds int4 the function cannot be
> executed/found
>
> SELECT * FROM normal_rand_array(5, 10, 8, 42::bigint);
>
> ERROR:  function normal_rand_array(integer, integer, integer, bigint)
> does not exist
> LINE 1: SELECT * FROM normal_rand_array(5, 10, 8, 42::bigint);
>                       ^
> HINT:  No function matches the given name and argument types. You might
> need to add explicit type casts.
>

This could be solved by defining separate functions for each supported
type, rather than one function with type anyelement. Internally, they
could continue to share common code, of course.

> In some cases the function returns an empty array. Is it also expected?
>

Perhaps it would be useful to have separate minimum and maximum array
length arguments, rather than a mean array length argument.

Actually, I find the current behaviour somewhat counterintuitive. Only
after reading the source code did I realise what it's actually doing,
which is this:

Row 1: array of random length in range [0, meanarraylen]
Row 2: array of length 2*meanarraylen - length of previous array
Row 3: array of random length in range [0, meanarraylen]
Row 4: array of length 2*meanarraylen - length of previous array
...

That's far from obvious (it's certainly not documented) and I don't
think it's a particularly good way of achieving a specified mean array
length, because only half the lengths are random.

One thing that's definitely needed is more documentation. It should
have its own new subsection, like the other tablefunc functions.

Something else that confused me is why this function is called
normal_rand_array(). The underlying random functions that it's calling
return random values from a uniform distribution, not a normal
distribution. Arrays of normally distributed random values might also
be useful, but that's not what this patch is doing.

Also, the function accepts float8 minval and maxval arguments, and
then simply ignores them and returns random float8 values in the range
[0,1), which is highly counterintuitive.

My suggestion would be to mirror the signatures of the core random()
functions more closely, and have this:

1). rand_array(numvals int, minlen int, maxlen int)
    returns setof float8[]

2). rand_array(numvals int, minlen int, maxlen int,
               minval int, maxval int)
    returns setof int[]

3). rand_array(numvals int, minlen int, maxlen int,
               minval bigint, maxval bigint)
    returns setof bigint[]

4). rand_array(numvals int, minlen int, maxlen int,
               minval numeric, maxval numeric)
    returns setof numeric[]

Also, I'd recommend giving the function arguments names in SQL, like
this, since that makes them easier to use.

Something else that's not obvious is that this patch is relying on the
core random functions, which means that it's using the same PRNG state
as those functions. That's probably OK, but it should be documented,
because it's different from tablefunc's normal_rand() function, which
uses pg_global_prng_state. In particular, what this means is that
calling setseed() will affect the output of these new functions, but
not normal_rand(). Incidentally, that makes writing tests much simpler
-- just call setseed() first and the output will be repeatable.

Regards,
Dean



Re: New function normal_rand_array function to contrib/tablefunc.

From
Andy Fan
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:

> My suggestion would be to mirror the signatures of the core random()
> functions more closely, and have this:
>
> 1). rand_array(numvals int, minlen int, maxlen int)
>     returns setof float8[]
>
> 2). rand_array(numvals int, minlen int, maxlen int,
>                minval int, maxval int)
>     returns setof int[]
>
> 3). rand_array(numvals int, minlen int, maxlen int,
>                minval bigint, maxval bigint)
>     returns setof bigint[]
>
> 4). rand_array(numvals int, minlen int, maxlen int,
>                minval numeric, maxval numeric)
>     returns setof numeric[]

this is indeed a more clean and correct APIs, I will use the above ones
in the next version. Thanks for the suggestion. 

It is just not clear to me how verbose the document should to be, and
where the document should be, tablefunc.sgml, the comment above the
function or the places just the code?  In many cases you provides above
or below are just implementation details, not the API declaimed purpose. 

> Something else that's not obvious is that this patch is relying on the
> core random functions, which means that it's using the same PRNG state
> as those functions. That's probably OK, but it should be documented,
> because it's different from tablefunc's normal_rand() function, which
> uses pg_global_prng_state.

My above question applies to this comment. 

> In particular, what this means is that
> calling setseed() will affect the output of these new functions, but
> not normal_rand(). Incidentally, that makes writing tests much simpler
> -- just call setseed() first and the output will be repeatable.

Good to know this user case. for example, should this be documented?

>> In some cases the function returns an empty array. Is it also expected?
>>
>
> Perhaps it would be useful to have separate minimum and maximum array
> length arguments, rather than a mean array length argument.

I'm not sure which one is better, but main user case of this function
for testing pupose, so it I think minimum and maximum array length is
good for me too.

> 
> Actually, I find the current behaviour somewhat counterintuitive. Only
> after reading the source code did I realise what it's actually doing,
> which is this:
>
> Row 1: array of random length in range [0, meanarraylen]
> Row 2: array of length 2*meanarraylen - length of previous array
> Row 3: array of random length in range [0, meanarraylen]
> Row 4: array of length 2*meanarraylen - length of previous array
> ...
>
> That's far from obvious (it's certainly not documented) and I don't
> think it's a particularly good way of achieving a specified mean array
> length, because only half the lengths are random.

I'm not sure how does this matter in real user case.

> One thing that's definitely needed is more documentation. It should
> have its own new subsection, like the other tablefunc functions.

is the documentaion for the '2*meanarraylen - lastarraylen'?

and What is new subsection, do you mean anything wrong in
'tablefunc.sgml', I did have some issue to run 'make html', but the
error exists before my patch, so I change the document carefully without
testing it.  do you know how to fix the below error in 'make html'?


$/usr/bin/xsltproc --nonet --path . --path . --stringparam pg.version '18devel'  stylesheet.xsl postgres-full.xml

I/O error : Attempt to load network entity http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl
warning: failed to load external entity "http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl"
compilation error: file stylesheet.xsl line 6 element import
xsl:import : unable to load http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl
I/O error : Attempt to load network entity http://docbook.sourceforge.net/release/xsl/current/common/entities.ent
stylesheet-html-common.xsl:4: warning: failed to load external entity
"http://docbook.sourceforge.net/release/xsl/current/common/entities.ent"
%common.entities;
                 ^
stylesheet-html-common.xsl:124: parser error : Entity 'primary' not defined
                          translate(substring(&primary;, 1, 1),


> Something else that confused me is why this function is called
> normal_rand_array(). The underlying random functions that it's calling
> return random values from a uniform distribution, not a normal
> distribution. Arrays of normally distributed random values might also
> be useful, but that's not what this patch is doing.

OK, you are right, your new names should be better. 

> Also, the function accepts float8 minval and maxval arguments, and
> then simply ignores them and returns random float8 values in the range
> [0,1), which is highly counterintuitive.

This is a obvious bug and it only exists in float8 case IIUC, will fix
it in the next version.


-- 
Best Regards
Andy Fan




Re: New function normal_rand_array function to contrib/tablefunc.

From
Andy Fan
Date:
Andy Fan <zhihuifan1213@163.com> writes:


(just noticed this reply is sent to Jim privately, re-sent it to
public.)

> Hi Jim,
>
>>
>> When either minval or maxval exceeds int4 the function cannot be
>> executed/found
>>
>> SELECT * FROM normal_rand_array(5, 10, 8, 42::bigint);
>>
>> ERROR:  function normal_rand_array(integer, integer, integer, bigint)
>> does not exist
>> LINE 1: SELECT * FROM normal_rand_array(5, 10, 8, 42::bigint);
>>                       ^
>> HINT:  No function matches the given name and argument types. You might
>> need to add explicit type casts.
>> ---
>>
>> SELECT * FROM normal_rand_array(5, 10, 8::bigint, 42);
>>
>> ERROR:  function normal_rand_array(integer, integer, bigint, integer)
>> does not exist
>> LINE 1: SELECT * FROM normal_rand_array(5, 10, 8::bigint, 42);
>>                       ^
>> HINT:  No function matches the given name and argument types. You might
>> need to add explicit type casts.
>> ---
>
>>
>> However, when both are int8 it works fine:
>
> I defined the function as below:
>
> postgres=# \df normal_rand_array
>                                         List of functions
>  Schema |       Name        | Result data type |           Argument data types            | Type
> --------+-------------------+------------------+------------------------------------------+------
>  public | normal_rand_array | SETOF anyarray   | integer, integer, anyelement, anyelement | func
> (1 row)
>
> so it is required that the 3nd and 4th argument should have the same
> data type, that's why your first 2 test case failed and the third one
> works.  and I also think we should not add a test case / document for
> this since the behavior of 'anyelement' system.
>

This issue can be fixed with the new API defined suggested by Dean.

>>
>> SELECT * FROM normal_rand_array(5, 10, 8::bigint, 42::bigint);
>>
>>                 normal_rand_array                 
>> --------------------------------------------------
>>  {29,38,31,10,23,39,9,32}
>>  {8,39,19,31,29,15,17,15,36,20,33,19}
>>  {15,18,42,19}
>>  {16,31,33,11,14,20,24,9,12,17,22,42,41,24,11,41}
>>  {15,11,36,8,28,37}
>> (5 rows)
>> ---
>>
>> Is it the expected behaviour?
>
> Yes, see the above statements.
>
>>
>> In some cases the function returns an empty array. Is it also expected?
>>
>> SELECT count(*)
>> FROM normal_rand_array(100000, 10, 8, 42) i
>> WHERE array_length(i,1) IS NULL;
>>
>>  count
>> -------
>>   4533
>> (1 row)
>
> Yes, by design I think it is a feature which could generate [] case
> which should be used a special case for testing, and at the
> implementation side, the [] means the length is 0 which is caused by I
> choose the 'len' by random [0 .. len * 2], so 0 is possible and doesn't
> confict with the declared behavior.
>
>> In both cases, perhaps mentioning these behaviors in the docs would
>> avoid some confusion.
>
> hmm, It doesn't take some big effort to add them, but I'm feeling that
> would make the document a bit of too verbose/detailed.
>
> Sorry for the late respone!

--
Best Regards
Andy Fan




Re: New function normal_rand_array function to contrib/tablefunc.

From
Dean Rasheed
Date:
On Wed, 17 Jul 2024 at 07:29, Andy Fan <zhihuifan1213@163.com> wrote:
>
> It is just not clear to me how verbose the document should to be, and
> where the document should be, tablefunc.sgml, the comment above the
> function or the places just the code?  In many cases you provides above
> or below are just implementation details, not the API declaimed purpose.
>
> > Something else that's not obvious is that this patch is relying on the
> > core random functions, which means that it's using the same PRNG state
> > as those functions. That's probably OK, but it should be documented,
> > because it's different from tablefunc's normal_rand() function, which
> > uses pg_global_prng_state.
>
> My above question applies to this comment.
>
> > One thing that's definitely needed is more documentation. It should
> > have its own new subsection, like the other tablefunc functions.
>

I was really referring to the SGML docs. Try to follow the style used
for the existing functions in tablefunc.sgml -- so in addition to
adding the row to the table at the top, also add one or more sections
further down the page to give more details, and example output.
Something like this:

https://www.postgresql.org/docs/current/tablefunc.html#TABLEFUNC-FUNCTIONS-NORMAL-RAND

That would be a good place to mention that setseed() can be used to
produce repeatable results.


> I did have some issue to run 'make html', but the
> error exists before my patch, so I change the document carefully without
> testing it.  do you know how to fix the below error in 'make html'?
>
> $/usr/bin/xsltproc --nonet --path . --path . --stringparam pg.version '18devel'  stylesheet.xsl postgres-full.xml
>
> I/O error : Attempt to load network entity http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl
> warning: failed to load external entity "http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl"
> compilation error: file stylesheet.xsl line 6 element import
> xsl:import : unable to load http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl
> I/O error : Attempt to load network entity http://docbook.sourceforge.net/release/xsl/current/common/entities.ent
> stylesheet-html-common.xsl:4: warning: failed to load external entity
"http://docbook.sourceforge.net/release/xsl/current/common/entities.ent"
> %common.entities;
>                  ^
> stylesheet-html-common.xsl:124: parser error : Entity 'primary' not defined
>                           translate(substring(&primary;, 1, 1),
>

This looks like you're missing a required package. Try installing
docbook-xsl or docbook-xsl-stylesheets or something similar (the
package name varies depending on your distro).

Regards,
Dean



Re: New function normal_rand_array function to contrib/tablefunc.

From
Japin Li
Date:
On Tue, 27 Aug 2024 at 16:43, Andy Fan <zhihuifan1213@163.com> wrote:
> Andy Fan <zhihuifan1213@163.com> writes:
>
>>>> My suggestion would be to mirror the signatures of the core random()
>>>> functions more closely, and have this:
>>>>
>>>> 1). rand_array(numvals int, minlen int, maxlen int)
>>>>     returns setof float8[]
>>>>
>> ..>
>>>> 4). rand_array(numvals int, minlen int, maxlen int,
>>>>                minval numeric, maxval numeric)
>>>>     returns setof numeric[]
>>
>>> this is indeed a more clean and correct APIs, I will use the above ones
>>> in the next version. Thanks for the suggestion.
>>
>> I followed your suggestion in the new attached version. They are not
>> only some cleaner APIs for user and but also some cleaner implementation
>> in core, Thank for this suggestion as well.
>
> A new version is attached, nothing changed except replace
> PG_GETARG_INT16 with PG_GETARG_INT32. PG_GETARG_INT16 is a copy-paste
> error.
>

Thanks for updating the patch. Here are some comments.

+    if (minlen >= maxlen)
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("minlen must be greater than maxlen.")));

There error message should be "minlen must be smaller than maxlen", right?

+    if (minlen < 0)
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("minlen and maxlen must be greater than zero.")));

Here the minlen might be zero, so the error message is incorrect.
How about use "minlen must be greater than or equal to zero"?

-- 
Regrads,
Japin Li



Re: New function normal_rand_array function to contrib/tablefunc.

From
Japin Li
Date:
On Wed, 28 Aug 2024 at 12:27, Andy Fan <zhihuifan1213@163.com> wrote:
> Japin Li <japinli@hotmail.com> writes:
>
>
>> Thanks for updating the patch. Here are some comments.
>>
>> +    if (minlen >= maxlen)
>> +        ereport(ERROR,
>> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +                 errmsg("minlen must be greater than maxlen.")));
>>
>> There error message should be "minlen must be smaller than maxlen", right?
>>
>> +    if (minlen < 0)
>> +        ereport(ERROR,
>> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +                 errmsg("minlen and maxlen must be greater than zero.")));
>>
>> Here the minlen might be zero, so the error message is incorrect.
>> How about use "minlen must be greater than or equal to zero"?
>
> Yes, you are right. A new version is attached, thanks for checking! 
>

Nitpick, the minlen is smaller than maxlen, so the maxlen cannot be zero.

After giving it some more thought, it would also be helpful if maxlen is
equal to minlen.

For example, I want have each row has four items, I can use the following

SELECT rand_array(10, 4, 4, 50::int, 80::int);

OTOH, I find the range bound uses "less than or equal to", how about
replacing "smaller" with "less"?

-- 
Regrads,
Japin Li

diff --git a/contrib/tablefunc/tablefunc.c b/contrib/tablefunc/tablefunc.c
index b24c70d538..69b276b285 100644
--- a/contrib/tablefunc/tablefunc.c
+++ b/contrib/tablefunc/tablefunc.c
@@ -306,10 +306,10 @@ rand_array_internal(FunctionCallInfo fcinfo, Oid datatype)
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                  errmsg("number of rows cannot be negative")));
 
-    if (minlen >= maxlen)
+    if (minlen > maxlen)
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("minlen must be smaller than maxlen.")));
+                 errmsg("minlen must be less than or equal to maxlen.")));
 
     if (minlen < 0)
         ereport(ERROR,

Re: New function normal_rand_array function to contrib/tablefunc.

From
Dean Rasheed
Date:
On Thu, 29 Aug 2024 at 05:39, Andy Fan <zhihuifan1213@163.com> wrote:
>
> Yes, that's a valid usage. the new vesion is attached.  I have changed
> the the commit entry [1] from "Waiting on Author" to "Needs review".
>

Note that this needs a rebase, following commit 4681ad4b2f.

Here are a few more review comments:


1). In the tests:

+select setseed(0.8);
+select rand_array(10, 0, 3, 50::int, 80::int);
+select setseed(0.8);
+select rand_array(10, 0, 3, 50::bigint, 80::bigint);
+select setseed(0.8);
+select rand_array(10, 0, 3, 50::float8, 80::float8);
+select setseed(0.8);
+select rand_array(10, 0, 3, 50::float4, 80::float4);
+select setseed(0.8);
+select rand_array(10, 0, 3, 50::numeric, 80::numeric);

this should really have a comment block to distinguish these new tests
from the preceeding normal_rand() tests.


2). It's worth also having tests for the error cases.


3). It's only necessary to call setseed() once to get a reproducible
set of results from then on.


4). I'd use setseed(0) or setseed(0.5), since those values are exactly
representable as double precision values, unlike 0.8, ensuring that it
works the same on all platforms. It might not matter, but why take the
risk?


5). The float8 case still has minimum and maximum value arguments that
it just ignores. It should either not take those arguments, or it
should respect them, and try to return float8 values in the requested
range. I suspect that trying to return  float8 values in the requested
range would be hard, if not impossible, due to the inexact nature of
double precision arithmetic. That's why I suggested earlier having the
float8 function not take minval/maxval arguments, and just return
values in the range 0 to 1.


6). This new function:

+static Datum
+rand_array_internal(FunctionCallInfo fcinfo, Oid datatype)
+{

should have a comment block. In particular, it should document what
its inputs and outputs are.


7). This code:

+    int num_tuples = PG_GETARG_INT32(0);
+    int minlen = PG_GETARG_INT32(1);
+    int maxlen = PG_GETARG_INT32(2);
+    Datum minval = PG_GETARG_DATUM(3),
+        maxval = PG_GETARG_DATUM(4);
+    rand_array_fctx *fctx;
+
+    if (datatype == INT4OID)
+        random_fn_oid = F_RANDOM_INT4_INT4;
+    else if (datatype == INT8OID)
+        random_fn_oid = F_RANDOM_INT8_INT8;
+    else if    (datatype == FLOAT8OID)
+        random_fn_oid = F_RANDOM_;
+    else if (datatype == NUMERICOID)
+        random_fn_oid = F_RANDOM_NUMERIC_NUMERIC;
+    else
+        elog(ERROR, "unsupported type %d for rand_array function.",
+             datatype);
+
+    if (num_tuples < 0)
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("number of rows cannot be negative")));
+
+    if (minlen > maxlen)
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("minlen must not be greater than maxlen.")));
+
+    if (minlen < 0)
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("minlen and maxlen must be greater than or
equal to zero.")));

should be inside the "if (SRF_IS_FIRSTCALL())" block. There's no point
repeating all those checks for every call.


8). I think it would be neater to code the "if (datatype == INT4OID)
... else if ..." as a switch statement.


9). I would swap the last 2 bound checks round, and simplify the error
messages as follows:

    if (minlen < 0)
        ereport(ERROR,
                errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                errmsg("minlen must be greater than or equal to zero"));

    if (maxlen < minlen)
        ereport(ERROR,
                errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                errmsg("maxlen must be greater than or equal to minlen"));

Also note that primary error messages should not end with a period.
See https://www.postgresql.org/docs/current/error-style-guide.html


10). In this error:

+        elog(ERROR, "unsupported type %d for rand_array function.",
+             datatype);

"datatype" is of type Oid, which is unsigned, and so it should use
"%u" not "%d". Also, as above, it should not end with a period, so it
should be:

+        elog(ERROR, "unsupported type %u for rand_array function",
+             datatype);


11). If the float8 case is made to not have minval/maxval arguments, this code:

+    Datum minval = PG_GETARG_DATUM(3),
+        maxval = PG_GETARG_DATUM(4);

and the FunctionCallInfo setup code needs to be different for the
float8 case, in order to avoid reading and setting undefined
arguments. Perhaps introduce a "need_val_bounds" boolean variable,
based on the datatype.


12). This code:

+        random_len_fcinfo->args[0].value = minlen;
+        random_len_fcinfo->args[1].value = maxlen;

should really be:

+        random_len_fcinfo->args[0].value = Int32GetDatum(minlen);
+        random_len_fcinfo->args[1].value = Int32GetDatum(maxlen);

It amounts to the same thing, but documents the fact that it's
converting an integer to a Datum.


13). These new functions are significantly under-documented,
especially when compared to all the other functions on
https://www.postgresql.org/docs/current/tablefunc.html

They really should have their own subsection, along the same lines as
"F.41.1.1. Normal_rand", explaining what the functions do, what their
arguments mean, and giving a couple of usage examples.

Regards,
Dean



Re: New function normal_rand_array function to contrib/tablefunc.

From
Andy Fan
Date:
> 10). In this error:
>
> +        elog(ERROR, "unsupported type %d for rand_array function.",
> +             datatype);
>
> "datatype" is of type Oid, which is unsigned, and so it should use
> "%u" not "%d". Also, as above, it should not end with a period, so it
> should be:
>
> +        elog(ERROR, "unsupported type %u for rand_array function",
> +             datatype);

I remember my IDE could detect such issue before, but failed this
time. I just checked more today and it is still failing. Looks the
checker is not stable and I can't find out the reason so far.

main.c

#include <stdio.h>

int main(int argc, char *argv[])
{
    unsigned int i = 0;
    int x = 2;
    printf("i = %d\n", i);
    printf("i = %u\n", x);    
    return 0;
}

All the following commands succeed without any warnings.

clang -O0 -g main.c -o main -Wall -Wformat
gcc -g main.c -o main -Wall -Wformat

scan-build clang -g main.c -o main -Wall -Wformat
cppcheck main.c

clang: 18.1.6
gcc: 13.3.0

Only "cppcheck --enable=all main.c" catch the warnning.

Any hints on this will be appreicated.

-- 
Best Regards
Andy Fan




Re: New function normal_rand_array function to contrib/tablefunc.

From
Dean Rasheed
Date:
On Sat, 26 Oct 2024 at 01:51, Andy Fan <zhihuifan1213@163.com> wrote:
>
> > 10). In this error:
> >
> > +        elog(ERROR, "unsupported type %d for rand_array function.",
> > +             datatype);
> >
> > "datatype" is of type Oid, which is unsigned, and so it should use
> > "%u" not "%d". Also, as above, it should not end with a period, so it
> > should be:
> >
> > +        elog(ERROR, "unsupported type %u for rand_array function",
> > +             datatype);
>
> All the following commands succeed without any warnings.
>
> clang -O0 -g main.c -o main -Wall -Wformat
> gcc -g main.c -o main -Wall -Wformat
>

This can be detected in gcc with -Wformat plus -Wformat-signedness flags.

I see that this has been discussed before (e.g., [1]), but it doesn't
look like those patches were committed, and there are still many such
warnings, if you try compiling postgres with those flags.

[1]
https://www.postgresql.org/message-id/flat/CA%2BhUKGJ7EQm9extQAgrFZNNUKqRT8Vv5t1tKqA-5nEcYn0%2BwNA%40mail.gmail.com

I don't know if anyone has any plans to pick up that work again, but
in any case, it seems wise to not add more.

Regards,
Dean



Re: New function normal_rand_array function to contrib/tablefunc.

From
Andy Fan
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:

> On Sat, 26 Oct 2024 at 01:51, Andy Fan <zhihuifan1213@163.com> wrote:
>>
>> > 10). In this error:
>> >
>> > +        elog(ERROR, "unsupported type %d for rand_array function.",
>> > +             datatype);
>> >
>> > "datatype" is of type Oid, which is unsigned, and so it should use
>> > "%u" not "%d". Also, as above, it should not end with a period, so it
>> > should be:
>> >
>> > +        elog(ERROR, "unsupported type %u for rand_array function",
>> > +             datatype);
>>
>> All the following commands succeed without any warnings.
>>
>> clang -O0 -g main.c -o main -Wall -Wformat
>> gcc -g main.c -o main -Wall -Wformat
>>
>
> This can be detected in gcc with -Wformat plus -Wformat-signedness
> flags.

Yes, this one works. I didn't realize we have "-Wformat-signedness"
subsection after we already have "-Wformat". I have added this one into
my toolset.

For recording purpose, clang doesn't support this option until now. 

>
> I see that this has been discussed before (e.g., [1]), but it doesn't
> look like those patches were committed, and there are still many such
> warnings, if you try compiling postgres with those flags.

OK, Thanks for the information. Currently I add the "c/c++-gcc" checker
for my c file, it just trigger when I am writting a file. so the
warnings in other places probably doesn't bother me. 
>
> [1]
> https://www.postgresql.org/message-id/flat/CA%2BhUKGJ7EQm9extQAgrFZNNUKqRT8Vv5t1tKqA-5nEcYn0%2BwNA%40mail.gmail.com
>
> I don't know if anyone has any plans to pick up that work again.
I can take that very soon.

> but in any case, it seems wise to not add more.
Very true.

Thank you for all your helps on this series. 

-- 
Best Regards
Andy Fan




Re: New function normal_rand_array function to contrib/tablefunc.

From
Japin Li
Date:
On Fri, 01 Nov 2024 at 09:33, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On Wed, 16 Oct 2024 at 08:43, Andy Fan <zhihuifan1213@163.com> wrote:
>>
>>  Thanks for the detailed feedback! Here is the rebased version.
>>
>
> I took another look at this and I think it's in reasonable shape.
>
> I'm attaching an update, rebasing it on top of 9be4e5d293.
>
> Also it was missing a required update to the meson.build file --
> that's the immediate cause of the other cfbot failures.
>
> The rest is just cosmetic tidying up, fixing indentation, tweaking
> comments, and the like. I also hacked on the docs a bit -- the
> synopsis only listed one of the new function signatures for some
> reason. After fixing that, I think it's sufficient to just list one
> usage example.
>

LGTM expect there is a warning when applying the patch.

Applying: tablefunc: Add rand_array() functions.
.git/rebase-apply/patch:475: trailing whitespace.
      rand_array
warning: 1 line adds whitespace errors.

The other looks good to me.

-- 
Regrads,
Japin Li



Re: New function normal_rand_array function to contrib/tablefunc.

From
Andy Fan
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:

> On Wed, 16 Oct 2024 at 08:43, Andy Fan <zhihuifan1213@163.com> wrote:
>>
>>  Thanks for the detailed feedback! Here is the rebased version.
>>
>
> I took another look at this and I think it's in reasonable shape.
>
> I'm attaching an update, rebasing it on top of 9be4e5d293.

Thank you Dean. 

> Also it was missing a required update to the meson.build file --
> that's the immediate cause of the other cfbot failures.

OK, great to know this one, I should have paid more attention to cfbot. 

> The rest is just cosmetic tidying up, fixing indentation, tweaking
> comments, and the like. I also hacked on the docs a bit -- the
> synopsis only listed one of the new function signatures for some
> reason. After fixing that, I think it's sufficient to just list one
> usage example.

That looks better now, more concise and expressive sentences.

-- 
Best Regards
Andy Fan




Re: New function normal_rand_array function to contrib/tablefunc.

From
Aleksander Alekseev
Date:
Hi everyone,

> >>  Thanks for the detailed feedback! Here is the rebased version.
> >>
> >
> > I took another look at this and I think it's in reasonable shape.
> >
> > I'm attaching an update, rebasing it on top of 9be4e5d293.
>
> Thank you Dean.
>
> > Also it was missing a required update to the meson.build file --
> > that's the immediate cause of the other cfbot failures.
>
> OK, great to know this one, I should have paid more attention to cfbot.
>
> > The rest is just cosmetic tidying up, fixing indentation, tweaking
> > comments, and the like. I also hacked on the docs a bit -- the
> > synopsis only listed one of the new function signatures for some
> > reason. After fixing that, I think it's sufficient to just list one
> > usage example.
>
> That looks better now, more concise and expressive sentences.

I played with patch v3. All in all it seems to be in good shape.

I wonder though whether tablefunc extension is the right place for the
function. To me it seems to be as useful as array_shuffle().

Personally I would name the function array_rand() in order to be
consistent with the rest of array_* functions [1].

I would also replace `minlen` and `maxlen` arguments with a single
`len` argument because the same result (although more slowly) can be
achieved like this:

SELECT trim_array(arr, random(0,3)) FROM rand_array(10, 3, 3, 50::int,
80::int) as arr;

This could be just a bike-shedding though. Does anyone feel necessary
to second any of these nitpicks?

[1]: https://www.postgresql.org/docs/current/functions-array.html

-- 
Best regards,
Aleksander Alekseev



Re: New function normal_rand_array function to contrib/tablefunc.

From
Aleksander Alekseev
Date:
Hi,

> I played with patch v3. All in all it seems to be in good shape.
>
> I wonder though whether tablefunc extension is the right place for the
> function. To me it seems to be as useful as array_shuffle().
>
> Personally I would name the function array_rand() in order to be
> consistent with the rest of array_* functions [1].
>
> I would also replace `minlen` and `maxlen` arguments with a single
> `len` argument because the same result (although more slowly) can be
> achieved like this:
>
> SELECT trim_array(arr, random(0,3)) FROM rand_array(10, 3, 3, 50::int,
> 80::int) as arr;
>
> This could be just a bike-shedding though. Does anyone feel necessary
> to second any of these nitpicks?
>
> [1]: https://www.postgresql.org/docs/current/functions-array.html

On second thought, is there really much value in returning a setof?
Any reason not to have an interface as simple and straightforward as
this:

```
=# SELECT array_random(1, 10, random(0, 3)) FROM generate_series( ... )
{5}
{1, 3, 8}
{7, 6}
...
```

? Or maybe I'm missing something?

-- 
Best regards,
Aleksander Alekseev



Re: New function normal_rand_array function to contrib/tablefunc.

From
Dean Rasheed
Date:
On Mon, 4 Nov 2024 at 14:46, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Any reason not to have an interface as simple and straightforward as
> this:
>
> =# SELECT array_random(1, 10, random(0, 3)) FROM generate_series( ... )
> {5}
> {1, 3, 8}
> {7, 6}
> ...
>

Yeah, that looks like a neater API.

Something that bothers me somewhat is that it's completely trivial for
the user to write such a function for themselves, so is it really
useful enough to include in core?

The other question is whether it's an array function or a random
function. I.e., should it be listed in "Table 9.55. Array Functions",
in which case the name array_random() makes sense, or should it be
listed in "Table 9.6. Random Functions", in which case it should
probably be called random_array(). I think the latter makes more
sense, since it's a function that generates random values, more
similar to the random(min, max) functions. Also I think it's more
useful if it shares the same PRNG, controlled by setseed(), and it
makes sense to group all such functions together.

Regards,
Dean



Re: New function normal_rand_array function to contrib/tablefunc.

From
Aleksander Alekseev
Date:
Hi Dean,

Thanks for your input.

> > Any reason not to have an interface as simple and straightforward as
> > this:
> >
> > =# SELECT array_random(1, 10, random(0, 3)) FROM generate_series( ... )
> > {5}
> > {1, 3, 8}
> > {7, 6}
> > ...
> >
>
> Yeah, that looks like a neater API.
>
> Something that bothers me somewhat is that it's completely trivial for
> the user to write such a function for themselves, so is it really
> useful enough to include in core?

I think it would be useful. Many users don't bother writing C
extensions for tasks like this. So at least our implementation is
going to be faster.

> The other question is whether it's an array function or a random
> function. I.e., should it be listed in "Table 9.55. Array Functions",
> in which case the name array_random() makes sense, or should it be
> listed in "Table 9.6. Random Functions", in which case it should
> probably be called random_array(). I think the latter makes more
> sense, since it's a function that generates random values, more
> similar to the random(min, max) functions. Also I think it's more
> useful if it shares the same PRNG, controlled by setseed(), and it
> makes sense to group all such functions together.

Good point. Personally I don't have a strong opinion on whether
random_array() or array_random() is preferable, for me either option
is OK. Perhaps we should cross-reference the function between two
sections of the documentation to make it easier to find.

-- 
Best regards,
Aleksander Alekseev



Re: New function normal_rand_array function to contrib/tablefunc.

From
Dean Rasheed
Date:
On Tue, 5 Nov 2024 at 15:23, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> > > =# SELECT array_random(1, 10, random(0, 3)) FROM generate_series( ... )
> > > {5}
> > > {1, 3, 8}
> > > {7, 6}
> > > ...
> >
> > Yeah, that looks like a neater API.
> >
> > Something that bothers me somewhat is that it's completely trivial for
> > the user to write such a function for themselves, so is it really
> > useful enough to include in core?
>
> I think it would be useful. Many users don't bother writing C
> extensions for tasks like this. So at least our implementation is
> going to be faster.
>

If we are going to add such a function to core, then I think we should
make it consistent and at least as flexible as the other array
functions, and support multi-dimensional arrays with optional
non-default lower-bounds, like array_fill(). I.e., something like:

  random_array(min int, max int, dims int[] [, lbounds int[]]) -> int[]

  Returns an array filled with random values in the range min <= x <= max,
  having dimensions of the lengths specified by dims. The optional lbounds
  argument supplies lower-bound values for each dimension (which default
  to all 1).

Regards,
Dean



Re: New function normal_rand_array function to contrib/tablefunc.

From
Aleksander Alekseev
Date:
Hi,

> If we are going to add such a function to core, then I think we should
> make it consistent and at least as flexible as the other array
> functions, and support multi-dimensional arrays with optional
> non-default lower-bounds, like array_fill(). I.e., something like:
>
>   random_array(min int, max int, dims int[] [, lbounds int[]]) -> int[]
>
>   Returns an array filled with random values in the range min <= x <= max,
>   having dimensions of the lengths specified by dims. The optional lbounds
>   argument supplies lower-bound values for each dimension (which default
>   to all 1).

FWIW we have several array_* functions that deal only with the first
dimension of an array: array_shuffle(), array_sample() the recently
added array_reverse() [1], the currently discussed array_sort() [2].

I suggest implementing the function in several steps. First implement
random_array(min, max, len). It's straightforward and IMO will provide
the most value to the majority of the users.Then we can either add an
optional argument or a random_array_multidim() function. This can be
implemented and discussed as a separate patch. This approach would be
convenient for the author and the reviewers and also will allow us to
deliver value to the users sooner.

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=49d6c7d8daba
[2]: https://commitfest.postgresql.org/50/5277/

--
Best regards,
Aleksander Alekseev



Re: New function normal_rand_array function to contrib/tablefunc.

From
Andy Fan
Date:
Hi,

Looks I miss some interesting dicussions in the recent days, the
pretty neat API random_array, random_array or array_random (I prefer the
random_array because of the setseed stuff as Dean said). These
dicussions absoluatly enrichs my API / decoument design experience. 

I'm still not sure if it should be put into core, I agree that 
random_array/array_random should be put into core if we go with this
way. It looks to me that the cost of putting them into core is just
it takes more OIDs in initdb, hence making catalog a bit of bigger, not 
sure the real matter of it.

As for if we should support random_array_multidim at the first step, as
a developer, I would like to try dim sutff since I have little
experience on this area, It is possible that this area bring some
other interesting experience for me as the things happen to me recently.


> I suggest implementing the function in several steps. First implement
> random_array(min, max, len). It's straightforward and IMO will provide
> the most value to the majority of the users.Then we can either add an
> optional argument or a random_array_multidim() function.

This may doubles the OID expence, every invariants needs 3 OIDs(int, bigint,
numeric). 

> This can be
> implemented and discussed as a separate patch. This approach would be
> convenient for the author and the reviewers and also will allow us to
> deliver value to the users sooner.

Appricated for your suggestion like this, Aleksander, I can understand
your kindly intention!

If no new ideas, I will defer to Dean's final decision for efficient
purpose.

-- 
Best Regards
Andy Fan




hi.

I don't want to disrupt this thread too much.
so I created a separate thread ([1]) for my attached patch.

[1] https://www.postgresql.org/message-id/CACJufxF8_VzCFRHRt4OHHF74QtB8tj5Z%3Ddjsy7Y31OHKG5s1-w%40mail.gmail.com