Thread: New function normal_rand_array function to contrib/tablefunc.
Here is a new function which could produce an array of numbers with a controllable array length and duplicated elements in these arrays. I used it when working with gin index, and I think it would be helpful for others as well, so here it is. select * from normal_rand_array(5, 10, 1.8::numeric, 3.5::numeric); normal_rand_array ----------------------------------------------- {3.3,2.3,2.7,3.2,2.0,2.7,3.4,2.7,2.3,2.9} {3.3,1.8,2.9,3.4,2.0,1.8,2.0,3.5,2.8,2.5} {2.1,1.9,2.3,1.9,2.5,2.7,2.4,2.9,1.8} {2.3,2.5,2.4,2.7,2.7,2.3,2.9,3.3,3.3,1.9,3.5} {2.8,3.4,2.7,1.8,3.3,2.3,2.2,3.5,2.6,2.5} (5 rows) select * from normal_rand_array(5, 10, 1.8::int4, 3.5::int4); normal_rand_array ------------------------------------- {3,2,2,3,4,2} {2,4,2,3,3,3,3,2,2,3,3,2,3,2} {2,4,3} {4,2,3,4,2,4,2,2,3,4,3,3,2,4,4,2,3} {4,3,3,4,3,3,4,2,4} (5 rows) the 5 means it needs to produce 5 rows in total and the 10 is the average array length, and 1.8 is the minvalue for the random function and 3.5 is the maxvalue. -- Best Regards Andy Fan
Attachment
It looks useful, for example, it can be used in sorting tests to make them more interesting. I just have one question. Whyare you using SRF_IS_FIRST CALL and not _PG_init? Best regards, Stepan Neretin.
Hi Andy On 08.06.24 08:05, Andy Fan wrote: > Here is a new function which could produce an array of numbers with a > controllable array length and duplicated elements in these arrays. I > used it when working with gin index, and I think it would be helpful for > others as well, so here it is. > > select * from normal_rand_array(5, 10, 1.8::numeric, 3.5::numeric); > normal_rand_array > ----------------------------------------------- > {3.3,2.3,2.7,3.2,2.0,2.7,3.4,2.7,2.3,2.9} > {3.3,1.8,2.9,3.4,2.0,1.8,2.0,3.5,2.8,2.5} > {2.1,1.9,2.3,1.9,2.5,2.7,2.4,2.9,1.8} > {2.3,2.5,2.4,2.7,2.7,2.3,2.9,3.3,3.3,1.9,3.5} > {2.8,3.4,2.7,1.8,3.3,2.3,2.2,3.5,2.6,2.5} > (5 rows) > > select * from normal_rand_array(5, 10, 1.8::int4, 3.5::int4); > normal_rand_array > ------------------------------------- > {3,2,2,3,4,2} > {2,4,2,3,3,3,3,2,2,3,3,2,3,2} > {2,4,3} > {4,2,3,4,2,4,2,2,3,4,3,3,2,4,4,2,3} > {4,3,3,4,3,3,4,2,4} > (5 rows) > > the 5 means it needs to produce 5 rows in total and the 10 is the > average array length, and 1.8 is the minvalue for the random function > and 3.5 is the maxvalue. > When either minval or maxval exceeds int4 the function cannot be executed/found SELECT * FROM normal_rand_array(5, 10, 8, 42::bigint); ERROR: function normal_rand_array(integer, integer, integer, bigint) does not exist LINE 1: SELECT * FROM normal_rand_array(5, 10, 8, 42::bigint); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. --- SELECT * FROM normal_rand_array(5, 10, 8::bigint, 42); ERROR: function normal_rand_array(integer, integer, bigint, integer) does not exist LINE 1: SELECT * FROM normal_rand_array(5, 10, 8::bigint, 42); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. --- However, when both are int8 it works fine: SELECT * FROM normal_rand_array(5, 10, 8::bigint, 42::bigint); normal_rand_array -------------------------------------------------- {29,38,31,10,23,39,9,32} {8,39,19,31,29,15,17,15,36,20,33,19} {15,18,42,19} {16,31,33,11,14,20,24,9,12,17,22,42,41,24,11,41} {15,11,36,8,28,37} (5 rows) --- Is it the expected behaviour? In some cases the function returns an empty array. Is it also expected? SELECT count(*) FROM normal_rand_array(100000, 10, 8, 42) i WHERE array_length(i,1) IS NULL; count ------- 4533 (1 row) In both cases, perhaps mentioning these behaviors in the docs would avoid some confusion. Thanks! Best, -- Jim
On Tue, 2 Jul 2024 at 11:18, Jim Jones <jim.jones@uni-muenster.de> wrote: > > When either minval or maxval exceeds int4 the function cannot be > executed/found > > SELECT * FROM normal_rand_array(5, 10, 8, 42::bigint); > > ERROR: function normal_rand_array(integer, integer, integer, bigint) > does not exist > LINE 1: SELECT * FROM normal_rand_array(5, 10, 8, 42::bigint); > ^ > HINT: No function matches the given name and argument types. You might > need to add explicit type casts. > This could be solved by defining separate functions for each supported type, rather than one function with type anyelement. Internally, they could continue to share common code, of course. > In some cases the function returns an empty array. Is it also expected? > Perhaps it would be useful to have separate minimum and maximum array length arguments, rather than a mean array length argument. Actually, I find the current behaviour somewhat counterintuitive. Only after reading the source code did I realise what it's actually doing, which is this: Row 1: array of random length in range [0, meanarraylen] Row 2: array of length 2*meanarraylen - length of previous array Row 3: array of random length in range [0, meanarraylen] Row 4: array of length 2*meanarraylen - length of previous array ... That's far from obvious (it's certainly not documented) and I don't think it's a particularly good way of achieving a specified mean array length, because only half the lengths are random. One thing that's definitely needed is more documentation. It should have its own new subsection, like the other tablefunc functions. Something else that confused me is why this function is called normal_rand_array(). The underlying random functions that it's calling return random values from a uniform distribution, not a normal distribution. Arrays of normally distributed random values might also be useful, but that's not what this patch is doing. Also, the function accepts float8 minval and maxval arguments, and then simply ignores them and returns random float8 values in the range [0,1), which is highly counterintuitive. My suggestion would be to mirror the signatures of the core random() functions more closely, and have this: 1). rand_array(numvals int, minlen int, maxlen int) returns setof float8[] 2). rand_array(numvals int, minlen int, maxlen int, minval int, maxval int) returns setof int[] 3). rand_array(numvals int, minlen int, maxlen int, minval bigint, maxval bigint) returns setof bigint[] 4). rand_array(numvals int, minlen int, maxlen int, minval numeric, maxval numeric) returns setof numeric[] Also, I'd recommend giving the function arguments names in SQL, like this, since that makes them easier to use. Something else that's not obvious is that this patch is relying on the core random functions, which means that it's using the same PRNG state as those functions. That's probably OK, but it should be documented, because it's different from tablefunc's normal_rand() function, which uses pg_global_prng_state. In particular, what this means is that calling setseed() will affect the output of these new functions, but not normal_rand(). Incidentally, that makes writing tests much simpler -- just call setseed() first and the output will be repeatable. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > My suggestion would be to mirror the signatures of the core random() > functions more closely, and have this: > > 1). rand_array(numvals int, minlen int, maxlen int) > returns setof float8[] > > 2). rand_array(numvals int, minlen int, maxlen int, > minval int, maxval int) > returns setof int[] > > 3). rand_array(numvals int, minlen int, maxlen int, > minval bigint, maxval bigint) > returns setof bigint[] > > 4). rand_array(numvals int, minlen int, maxlen int, > minval numeric, maxval numeric) > returns setof numeric[] this is indeed a more clean and correct APIs, I will use the above ones in the next version. Thanks for the suggestion. It is just not clear to me how verbose the document should to be, and where the document should be, tablefunc.sgml, the comment above the function or the places just the code? In many cases you provides above or below are just implementation details, not the API declaimed purpose. > Something else that's not obvious is that this patch is relying on the > core random functions, which means that it's using the same PRNG state > as those functions. That's probably OK, but it should be documented, > because it's different from tablefunc's normal_rand() function, which > uses pg_global_prng_state. My above question applies to this comment. > In particular, what this means is that > calling setseed() will affect the output of these new functions, but > not normal_rand(). Incidentally, that makes writing tests much simpler > -- just call setseed() first and the output will be repeatable. Good to know this user case. for example, should this be documented? >> In some cases the function returns an empty array. Is it also expected? >> > > Perhaps it would be useful to have separate minimum and maximum array > length arguments, rather than a mean array length argument. I'm not sure which one is better, but main user case of this function for testing pupose, so it I think minimum and maximum array length is good for me too. > > Actually, I find the current behaviour somewhat counterintuitive. Only > after reading the source code did I realise what it's actually doing, > which is this: > > Row 1: array of random length in range [0, meanarraylen] > Row 2: array of length 2*meanarraylen - length of previous array > Row 3: array of random length in range [0, meanarraylen] > Row 4: array of length 2*meanarraylen - length of previous array > ... > > That's far from obvious (it's certainly not documented) and I don't > think it's a particularly good way of achieving a specified mean array > length, because only half the lengths are random. I'm not sure how does this matter in real user case. > One thing that's definitely needed is more documentation. It should > have its own new subsection, like the other tablefunc functions. is the documentaion for the '2*meanarraylen - lastarraylen'? and What is new subsection, do you mean anything wrong in 'tablefunc.sgml', I did have some issue to run 'make html', but the error exists before my patch, so I change the document carefully without testing it. do you know how to fix the below error in 'make html'? $/usr/bin/xsltproc --nonet --path . --path . --stringparam pg.version '18devel' stylesheet.xsl postgres-full.xml I/O error : Attempt to load network entity http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl warning: failed to load external entity "http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl" compilation error: file stylesheet.xsl line 6 element import xsl:import : unable to load http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl I/O error : Attempt to load network entity http://docbook.sourceforge.net/release/xsl/current/common/entities.ent stylesheet-html-common.xsl:4: warning: failed to load external entity "http://docbook.sourceforge.net/release/xsl/current/common/entities.ent" %common.entities; ^ stylesheet-html-common.xsl:124: parser error : Entity 'primary' not defined translate(substring(&primary;, 1, 1), > Something else that confused me is why this function is called > normal_rand_array(). The underlying random functions that it's calling > return random values from a uniform distribution, not a normal > distribution. Arrays of normally distributed random values might also > be useful, but that's not what this patch is doing. OK, you are right, your new names should be better. > Also, the function accepts float8 minval and maxval arguments, and > then simply ignores them and returns random float8 values in the range > [0,1), which is highly counterintuitive. This is a obvious bug and it only exists in float8 case IIUC, will fix it in the next version. -- Best Regards Andy Fan
Andy Fan <zhihuifan1213@163.com> writes: (just noticed this reply is sent to Jim privately, re-sent it to public.) > Hi Jim, > >> >> When either minval or maxval exceeds int4 the function cannot be >> executed/found >> >> SELECT * FROM normal_rand_array(5, 10, 8, 42::bigint); >> >> ERROR: function normal_rand_array(integer, integer, integer, bigint) >> does not exist >> LINE 1: SELECT * FROM normal_rand_array(5, 10, 8, 42::bigint); >> ^ >> HINT: No function matches the given name and argument types. You might >> need to add explicit type casts. >> --- >> >> SELECT * FROM normal_rand_array(5, 10, 8::bigint, 42); >> >> ERROR: function normal_rand_array(integer, integer, bigint, integer) >> does not exist >> LINE 1: SELECT * FROM normal_rand_array(5, 10, 8::bigint, 42); >> ^ >> HINT: No function matches the given name and argument types. You might >> need to add explicit type casts. >> --- > >> >> However, when both are int8 it works fine: > > I defined the function as below: > > postgres=# \df normal_rand_array > List of functions > Schema | Name | Result data type | Argument data types | Type > --------+-------------------+------------------+------------------------------------------+------ > public | normal_rand_array | SETOF anyarray | integer, integer, anyelement, anyelement | func > (1 row) > > so it is required that the 3nd and 4th argument should have the same > data type, that's why your first 2 test case failed and the third one > works. and I also think we should not add a test case / document for > this since the behavior of 'anyelement' system. > This issue can be fixed with the new API defined suggested by Dean. >> >> SELECT * FROM normal_rand_array(5, 10, 8::bigint, 42::bigint); >> >> normal_rand_array >> -------------------------------------------------- >> {29,38,31,10,23,39,9,32} >> {8,39,19,31,29,15,17,15,36,20,33,19} >> {15,18,42,19} >> {16,31,33,11,14,20,24,9,12,17,22,42,41,24,11,41} >> {15,11,36,8,28,37} >> (5 rows) >> --- >> >> Is it the expected behaviour? > > Yes, see the above statements. > >> >> In some cases the function returns an empty array. Is it also expected? >> >> SELECT count(*) >> FROM normal_rand_array(100000, 10, 8, 42) i >> WHERE array_length(i,1) IS NULL; >> >> count >> ------- >> 4533 >> (1 row) > > Yes, by design I think it is a feature which could generate [] case > which should be used a special case for testing, and at the > implementation side, the [] means the length is 0 which is caused by I > choose the 'len' by random [0 .. len * 2], so 0 is possible and doesn't > confict with the declared behavior. > >> In both cases, perhaps mentioning these behaviors in the docs would >> avoid some confusion. > > hmm, It doesn't take some big effort to add them, but I'm feeling that > would make the document a bit of too verbose/detailed. > > Sorry for the late respone! -- Best Regards Andy Fan
On Wed, 17 Jul 2024 at 07:29, Andy Fan <zhihuifan1213@163.com> wrote: > > It is just not clear to me how verbose the document should to be, and > where the document should be, tablefunc.sgml, the comment above the > function or the places just the code? In many cases you provides above > or below are just implementation details, not the API declaimed purpose. > > > Something else that's not obvious is that this patch is relying on the > > core random functions, which means that it's using the same PRNG state > > as those functions. That's probably OK, but it should be documented, > > because it's different from tablefunc's normal_rand() function, which > > uses pg_global_prng_state. > > My above question applies to this comment. > > > One thing that's definitely needed is more documentation. It should > > have its own new subsection, like the other tablefunc functions. > I was really referring to the SGML docs. Try to follow the style used for the existing functions in tablefunc.sgml -- so in addition to adding the row to the table at the top, also add one or more sections further down the page to give more details, and example output. Something like this: https://www.postgresql.org/docs/current/tablefunc.html#TABLEFUNC-FUNCTIONS-NORMAL-RAND That would be a good place to mention that setseed() can be used to produce repeatable results. > I did have some issue to run 'make html', but the > error exists before my patch, so I change the document carefully without > testing it. do you know how to fix the below error in 'make html'? > > $/usr/bin/xsltproc --nonet --path . --path . --stringparam pg.version '18devel' stylesheet.xsl postgres-full.xml > > I/O error : Attempt to load network entity http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl > warning: failed to load external entity "http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl" > compilation error: file stylesheet.xsl line 6 element import > xsl:import : unable to load http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl > I/O error : Attempt to load network entity http://docbook.sourceforge.net/release/xsl/current/common/entities.ent > stylesheet-html-common.xsl:4: warning: failed to load external entity "http://docbook.sourceforge.net/release/xsl/current/common/entities.ent" > %common.entities; > ^ > stylesheet-html-common.xsl:124: parser error : Entity 'primary' not defined > translate(substring(&primary;, 1, 1), > This looks like you're missing a required package. Try installing docbook-xsl or docbook-xsl-stylesheets or something similar (the package name varies depending on your distro). Regards, Dean
On Tue, 27 Aug 2024 at 16:43, Andy Fan <zhihuifan1213@163.com> wrote: > Andy Fan <zhihuifan1213@163.com> writes: > >>>> My suggestion would be to mirror the signatures of the core random() >>>> functions more closely, and have this: >>>> >>>> 1). rand_array(numvals int, minlen int, maxlen int) >>>> returns setof float8[] >>>> >> ..> >>>> 4). rand_array(numvals int, minlen int, maxlen int, >>>> minval numeric, maxval numeric) >>>> returns setof numeric[] >> >>> this is indeed a more clean and correct APIs, I will use the above ones >>> in the next version. Thanks for the suggestion. >> >> I followed your suggestion in the new attached version. They are not >> only some cleaner APIs for user and but also some cleaner implementation >> in core, Thank for this suggestion as well. > > A new version is attached, nothing changed except replace > PG_GETARG_INT16 with PG_GETARG_INT32. PG_GETARG_INT16 is a copy-paste > error. > Thanks for updating the patch. Here are some comments. + if (minlen >= maxlen) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("minlen must be greater than maxlen."))); There error message should be "minlen must be smaller than maxlen", right? + if (minlen < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("minlen and maxlen must be greater than zero."))); Here the minlen might be zero, so the error message is incorrect. How about use "minlen must be greater than or equal to zero"? -- Regrads, Japin Li
On Wed, 28 Aug 2024 at 12:27, Andy Fan <zhihuifan1213@163.com> wrote: > Japin Li <japinli@hotmail.com> writes: > > >> Thanks for updating the patch. Here are some comments. >> >> + if (minlen >= maxlen) >> + ereport(ERROR, >> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >> + errmsg("minlen must be greater than maxlen."))); >> >> There error message should be "minlen must be smaller than maxlen", right? >> >> + if (minlen < 0) >> + ereport(ERROR, >> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >> + errmsg("minlen and maxlen must be greater than zero."))); >> >> Here the minlen might be zero, so the error message is incorrect. >> How about use "minlen must be greater than or equal to zero"? > > Yes, you are right. A new version is attached, thanks for checking! > Nitpick, the minlen is smaller than maxlen, so the maxlen cannot be zero. After giving it some more thought, it would also be helpful if maxlen is equal to minlen. For example, I want have each row has four items, I can use the following SELECT rand_array(10, 4, 4, 50::int, 80::int); OTOH, I find the range bound uses "less than or equal to", how about replacing "smaller" with "less"? -- Regrads, Japin Li diff --git a/contrib/tablefunc/tablefunc.c b/contrib/tablefunc/tablefunc.c index b24c70d538..69b276b285 100644 --- a/contrib/tablefunc/tablefunc.c +++ b/contrib/tablefunc/tablefunc.c @@ -306,10 +306,10 @@ rand_array_internal(FunctionCallInfo fcinfo, Oid datatype) (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("number of rows cannot be negative"))); - if (minlen >= maxlen) + if (minlen > maxlen) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("minlen must be smaller than maxlen."))); + errmsg("minlen must be less than or equal to maxlen."))); if (minlen < 0) ereport(ERROR,
On Thu, 29 Aug 2024 at 05:39, Andy Fan <zhihuifan1213@163.com> wrote: > > Yes, that's a valid usage. the new vesion is attached. I have changed > the the commit entry [1] from "Waiting on Author" to "Needs review". > Note that this needs a rebase, following commit 4681ad4b2f. Here are a few more review comments: 1). In the tests: +select setseed(0.8); +select rand_array(10, 0, 3, 50::int, 80::int); +select setseed(0.8); +select rand_array(10, 0, 3, 50::bigint, 80::bigint); +select setseed(0.8); +select rand_array(10, 0, 3, 50::float8, 80::float8); +select setseed(0.8); +select rand_array(10, 0, 3, 50::float4, 80::float4); +select setseed(0.8); +select rand_array(10, 0, 3, 50::numeric, 80::numeric); this should really have a comment block to distinguish these new tests from the preceeding normal_rand() tests. 2). It's worth also having tests for the error cases. 3). It's only necessary to call setseed() once to get a reproducible set of results from then on. 4). I'd use setseed(0) or setseed(0.5), since those values are exactly representable as double precision values, unlike 0.8, ensuring that it works the same on all platforms. It might not matter, but why take the risk? 5). The float8 case still has minimum and maximum value arguments that it just ignores. It should either not take those arguments, or it should respect them, and try to return float8 values in the requested range. I suspect that trying to return float8 values in the requested range would be hard, if not impossible, due to the inexact nature of double precision arithmetic. That's why I suggested earlier having the float8 function not take minval/maxval arguments, and just return values in the range 0 to 1. 6). This new function: +static Datum +rand_array_internal(FunctionCallInfo fcinfo, Oid datatype) +{ should have a comment block. In particular, it should document what its inputs and outputs are. 7). This code: + int num_tuples = PG_GETARG_INT32(0); + int minlen = PG_GETARG_INT32(1); + int maxlen = PG_GETARG_INT32(2); + Datum minval = PG_GETARG_DATUM(3), + maxval = PG_GETARG_DATUM(4); + rand_array_fctx *fctx; + + if (datatype == INT4OID) + random_fn_oid = F_RANDOM_INT4_INT4; + else if (datatype == INT8OID) + random_fn_oid = F_RANDOM_INT8_INT8; + else if (datatype == FLOAT8OID) + random_fn_oid = F_RANDOM_; + else if (datatype == NUMERICOID) + random_fn_oid = F_RANDOM_NUMERIC_NUMERIC; + else + elog(ERROR, "unsupported type %d for rand_array function.", + datatype); + + if (num_tuples < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of rows cannot be negative"))); + + if (minlen > maxlen) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("minlen must not be greater than maxlen."))); + + if (minlen < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("minlen and maxlen must be greater than or equal to zero."))); should be inside the "if (SRF_IS_FIRSTCALL())" block. There's no point repeating all those checks for every call. 8). I think it would be neater to code the "if (datatype == INT4OID) ... else if ..." as a switch statement. 9). I would swap the last 2 bound checks round, and simplify the error messages as follows: if (minlen < 0) ereport(ERROR, errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("minlen must be greater than or equal to zero")); if (maxlen < minlen) ereport(ERROR, errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("maxlen must be greater than or equal to minlen")); Also note that primary error messages should not end with a period. See https://www.postgresql.org/docs/current/error-style-guide.html 10). In this error: + elog(ERROR, "unsupported type %d for rand_array function.", + datatype); "datatype" is of type Oid, which is unsigned, and so it should use "%u" not "%d". Also, as above, it should not end with a period, so it should be: + elog(ERROR, "unsupported type %u for rand_array function", + datatype); 11). If the float8 case is made to not have minval/maxval arguments, this code: + Datum minval = PG_GETARG_DATUM(3), + maxval = PG_GETARG_DATUM(4); and the FunctionCallInfo setup code needs to be different for the float8 case, in order to avoid reading and setting undefined arguments. Perhaps introduce a "need_val_bounds" boolean variable, based on the datatype. 12). This code: + random_len_fcinfo->args[0].value = minlen; + random_len_fcinfo->args[1].value = maxlen; should really be: + random_len_fcinfo->args[0].value = Int32GetDatum(minlen); + random_len_fcinfo->args[1].value = Int32GetDatum(maxlen); It amounts to the same thing, but documents the fact that it's converting an integer to a Datum. 13). These new functions are significantly under-documented, especially when compared to all the other functions on https://www.postgresql.org/docs/current/tablefunc.html They really should have their own subsection, along the same lines as "F.41.1.1. Normal_rand", explaining what the functions do, what their arguments mean, and giving a couple of usage examples. Regards, Dean
> 10). In this error: > > + elog(ERROR, "unsupported type %d for rand_array function.", > + datatype); > > "datatype" is of type Oid, which is unsigned, and so it should use > "%u" not "%d". Also, as above, it should not end with a period, so it > should be: > > + elog(ERROR, "unsupported type %u for rand_array function", > + datatype); I remember my IDE could detect such issue before, but failed this time. I just checked more today and it is still failing. Looks the checker is not stable and I can't find out the reason so far. main.c #include <stdio.h> int main(int argc, char *argv[]) { unsigned int i = 0; int x = 2; printf("i = %d\n", i); printf("i = %u\n", x); return 0; } All the following commands succeed without any warnings. clang -O0 -g main.c -o main -Wall -Wformat gcc -g main.c -o main -Wall -Wformat scan-build clang -g main.c -o main -Wall -Wformat cppcheck main.c clang: 18.1.6 gcc: 13.3.0 Only "cppcheck --enable=all main.c" catch the warnning. Any hints on this will be appreicated. -- Best Regards Andy Fan
On Sat, 26 Oct 2024 at 01:51, Andy Fan <zhihuifan1213@163.com> wrote: > > > 10). In this error: > > > > + elog(ERROR, "unsupported type %d for rand_array function.", > > + datatype); > > > > "datatype" is of type Oid, which is unsigned, and so it should use > > "%u" not "%d". Also, as above, it should not end with a period, so it > > should be: > > > > + elog(ERROR, "unsupported type %u for rand_array function", > > + datatype); > > All the following commands succeed without any warnings. > > clang -O0 -g main.c -o main -Wall -Wformat > gcc -g main.c -o main -Wall -Wformat > This can be detected in gcc with -Wformat plus -Wformat-signedness flags. I see that this has been discussed before (e.g., [1]), but it doesn't look like those patches were committed, and there are still many such warnings, if you try compiling postgres with those flags. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJ7EQm9extQAgrFZNNUKqRT8Vv5t1tKqA-5nEcYn0%2BwNA%40mail.gmail.com I don't know if anyone has any plans to pick up that work again, but in any case, it seems wise to not add more. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On Sat, 26 Oct 2024 at 01:51, Andy Fan <zhihuifan1213@163.com> wrote: >> >> > 10). In this error: >> > >> > + elog(ERROR, "unsupported type %d for rand_array function.", >> > + datatype); >> > >> > "datatype" is of type Oid, which is unsigned, and so it should use >> > "%u" not "%d". Also, as above, it should not end with a period, so it >> > should be: >> > >> > + elog(ERROR, "unsupported type %u for rand_array function", >> > + datatype); >> >> All the following commands succeed without any warnings. >> >> clang -O0 -g main.c -o main -Wall -Wformat >> gcc -g main.c -o main -Wall -Wformat >> > > This can be detected in gcc with -Wformat plus -Wformat-signedness > flags. Yes, this one works. I didn't realize we have "-Wformat-signedness" subsection after we already have "-Wformat". I have added this one into my toolset. For recording purpose, clang doesn't support this option until now. > > I see that this has been discussed before (e.g., [1]), but it doesn't > look like those patches were committed, and there are still many such > warnings, if you try compiling postgres with those flags. OK, Thanks for the information. Currently I add the "c/c++-gcc" checker for my c file, it just trigger when I am writting a file. so the warnings in other places probably doesn't bother me. > > [1] > https://www.postgresql.org/message-id/flat/CA%2BhUKGJ7EQm9extQAgrFZNNUKqRT8Vv5t1tKqA-5nEcYn0%2BwNA%40mail.gmail.com > > I don't know if anyone has any plans to pick up that work again. I can take that very soon. > but in any case, it seems wise to not add more. Very true. Thank you for all your helps on this series. -- Best Regards Andy Fan
On Fri, 01 Nov 2024 at 09:33, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On Wed, 16 Oct 2024 at 08:43, Andy Fan <zhihuifan1213@163.com> wrote: >> >> Thanks for the detailed feedback! Here is the rebased version. >> > > I took another look at this and I think it's in reasonable shape. > > I'm attaching an update, rebasing it on top of 9be4e5d293. > > Also it was missing a required update to the meson.build file -- > that's the immediate cause of the other cfbot failures. > > The rest is just cosmetic tidying up, fixing indentation, tweaking > comments, and the like. I also hacked on the docs a bit -- the > synopsis only listed one of the new function signatures for some > reason. After fixing that, I think it's sufficient to just list one > usage example. > LGTM expect there is a warning when applying the patch. Applying: tablefunc: Add rand_array() functions. .git/rebase-apply/patch:475: trailing whitespace. rand_array warning: 1 line adds whitespace errors. The other looks good to me. -- Regrads, Japin Li
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On Wed, 16 Oct 2024 at 08:43, Andy Fan <zhihuifan1213@163.com> wrote: >> >> Thanks for the detailed feedback! Here is the rebased version. >> > > I took another look at this and I think it's in reasonable shape. > > I'm attaching an update, rebasing it on top of 9be4e5d293. Thank you Dean. > Also it was missing a required update to the meson.build file -- > that's the immediate cause of the other cfbot failures. OK, great to know this one, I should have paid more attention to cfbot. > The rest is just cosmetic tidying up, fixing indentation, tweaking > comments, and the like. I also hacked on the docs a bit -- the > synopsis only listed one of the new function signatures for some > reason. After fixing that, I think it's sufficient to just list one > usage example. That looks better now, more concise and expressive sentences. -- Best Regards Andy Fan
Hi everyone, > >> Thanks for the detailed feedback! Here is the rebased version. > >> > > > > I took another look at this and I think it's in reasonable shape. > > > > I'm attaching an update, rebasing it on top of 9be4e5d293. > > Thank you Dean. > > > Also it was missing a required update to the meson.build file -- > > that's the immediate cause of the other cfbot failures. > > OK, great to know this one, I should have paid more attention to cfbot. > > > The rest is just cosmetic tidying up, fixing indentation, tweaking > > comments, and the like. I also hacked on the docs a bit -- the > > synopsis only listed one of the new function signatures for some > > reason. After fixing that, I think it's sufficient to just list one > > usage example. > > That looks better now, more concise and expressive sentences. I played with patch v3. All in all it seems to be in good shape. I wonder though whether tablefunc extension is the right place for the function. To me it seems to be as useful as array_shuffle(). Personally I would name the function array_rand() in order to be consistent with the rest of array_* functions [1]. I would also replace `minlen` and `maxlen` arguments with a single `len` argument because the same result (although more slowly) can be achieved like this: SELECT trim_array(arr, random(0,3)) FROM rand_array(10, 3, 3, 50::int, 80::int) as arr; This could be just a bike-shedding though. Does anyone feel necessary to second any of these nitpicks? [1]: https://www.postgresql.org/docs/current/functions-array.html -- Best regards, Aleksander Alekseev
Hi, > I played with patch v3. All in all it seems to be in good shape. > > I wonder though whether tablefunc extension is the right place for the > function. To me it seems to be as useful as array_shuffle(). > > Personally I would name the function array_rand() in order to be > consistent with the rest of array_* functions [1]. > > I would also replace `minlen` and `maxlen` arguments with a single > `len` argument because the same result (although more slowly) can be > achieved like this: > > SELECT trim_array(arr, random(0,3)) FROM rand_array(10, 3, 3, 50::int, > 80::int) as arr; > > This could be just a bike-shedding though. Does anyone feel necessary > to second any of these nitpicks? > > [1]: https://www.postgresql.org/docs/current/functions-array.html On second thought, is there really much value in returning a setof? Any reason not to have an interface as simple and straightforward as this: ``` =# SELECT array_random(1, 10, random(0, 3)) FROM generate_series( ... ) {5} {1, 3, 8} {7, 6} ... ``` ? Or maybe I'm missing something? -- Best regards, Aleksander Alekseev
On Mon, 4 Nov 2024 at 14:46, Aleksander Alekseev <aleksander@timescale.com> wrote: > > Any reason not to have an interface as simple and straightforward as > this: > > =# SELECT array_random(1, 10, random(0, 3)) FROM generate_series( ... ) > {5} > {1, 3, 8} > {7, 6} > ... > Yeah, that looks like a neater API. Something that bothers me somewhat is that it's completely trivial for the user to write such a function for themselves, so is it really useful enough to include in core? The other question is whether it's an array function or a random function. I.e., should it be listed in "Table 9.55. Array Functions", in which case the name array_random() makes sense, or should it be listed in "Table 9.6. Random Functions", in which case it should probably be called random_array(). I think the latter makes more sense, since it's a function that generates random values, more similar to the random(min, max) functions. Also I think it's more useful if it shares the same PRNG, controlled by setseed(), and it makes sense to group all such functions together. Regards, Dean
Hi Dean, Thanks for your input. > > Any reason not to have an interface as simple and straightforward as > > this: > > > > =# SELECT array_random(1, 10, random(0, 3)) FROM generate_series( ... ) > > {5} > > {1, 3, 8} > > {7, 6} > > ... > > > > Yeah, that looks like a neater API. > > Something that bothers me somewhat is that it's completely trivial for > the user to write such a function for themselves, so is it really > useful enough to include in core? I think it would be useful. Many users don't bother writing C extensions for tasks like this. So at least our implementation is going to be faster. > The other question is whether it's an array function or a random > function. I.e., should it be listed in "Table 9.55. Array Functions", > in which case the name array_random() makes sense, or should it be > listed in "Table 9.6. Random Functions", in which case it should > probably be called random_array(). I think the latter makes more > sense, since it's a function that generates random values, more > similar to the random(min, max) functions. Also I think it's more > useful if it shares the same PRNG, controlled by setseed(), and it > makes sense to group all such functions together. Good point. Personally I don't have a strong opinion on whether random_array() or array_random() is preferable, for me either option is OK. Perhaps we should cross-reference the function between two sections of the documentation to make it easier to find. -- Best regards, Aleksander Alekseev
On Tue, 5 Nov 2024 at 15:23, Aleksander Alekseev <aleksander@timescale.com> wrote: > > > > =# SELECT array_random(1, 10, random(0, 3)) FROM generate_series( ... ) > > > {5} > > > {1, 3, 8} > > > {7, 6} > > > ... > > > > Yeah, that looks like a neater API. > > > > Something that bothers me somewhat is that it's completely trivial for > > the user to write such a function for themselves, so is it really > > useful enough to include in core? > > I think it would be useful. Many users don't bother writing C > extensions for tasks like this. So at least our implementation is > going to be faster. > If we are going to add such a function to core, then I think we should make it consistent and at least as flexible as the other array functions, and support multi-dimensional arrays with optional non-default lower-bounds, like array_fill(). I.e., something like: random_array(min int, max int, dims int[] [, lbounds int[]]) -> int[] Returns an array filled with random values in the range min <= x <= max, having dimensions of the lengths specified by dims. The optional lbounds argument supplies lower-bound values for each dimension (which default to all 1). Regards, Dean
Hi, > If we are going to add such a function to core, then I think we should > make it consistent and at least as flexible as the other array > functions, and support multi-dimensional arrays with optional > non-default lower-bounds, like array_fill(). I.e., something like: > > random_array(min int, max int, dims int[] [, lbounds int[]]) -> int[] > > Returns an array filled with random values in the range min <= x <= max, > having dimensions of the lengths specified by dims. The optional lbounds > argument supplies lower-bound values for each dimension (which default > to all 1). FWIW we have several array_* functions that deal only with the first dimension of an array: array_shuffle(), array_sample() the recently added array_reverse() [1], the currently discussed array_sort() [2]. I suggest implementing the function in several steps. First implement random_array(min, max, len). It's straightforward and IMO will provide the most value to the majority of the users.Then we can either add an optional argument or a random_array_multidim() function. This can be implemented and discussed as a separate patch. This approach would be convenient for the author and the reviewers and also will allow us to deliver value to the users sooner. [1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=49d6c7d8daba [2]: https://commitfest.postgresql.org/50/5277/ -- Best regards, Aleksander Alekseev
Hi, Looks I miss some interesting dicussions in the recent days, the pretty neat API random_array, random_array or array_random (I prefer the random_array because of the setseed stuff as Dean said). These dicussions absoluatly enrichs my API / decoument design experience. I'm still not sure if it should be put into core, I agree that random_array/array_random should be put into core if we go with this way. It looks to me that the cost of putting them into core is just it takes more OIDs in initdb, hence making catalog a bit of bigger, not sure the real matter of it. As for if we should support random_array_multidim at the first step, as a developer, I would like to try dim sutff since I have little experience on this area, It is possible that this area bring some other interesting experience for me as the things happen to me recently. > I suggest implementing the function in several steps. First implement > random_array(min, max, len). It's straightforward and IMO will provide > the most value to the majority of the users.Then we can either add an > optional argument or a random_array_multidim() function. This may doubles the OID expence, every invariants needs 3 OIDs(int, bigint, numeric). > This can be > implemented and discussed as a separate patch. This approach would be > convenient for the author and the reviewers and also will allow us to > deliver value to the users sooner. Appricated for your suggestion like this, Aleksander, I can understand your kindly intention! If no new ideas, I will defer to Dean's final decision for efficient purpose. -- Best Regards Andy Fan
hi. I don't want to disrupt this thread too much. so I created a separate thread ([1]) for my attached patch. [1] https://www.postgresql.org/message-id/CACJufxF8_VzCFRHRt4OHHF74QtB8tj5Z%3Ddjsy7Y31OHKG5s1-w%40mail.gmail.com