Re: proposal - function string_to_table - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | Re: proposal - function string_to_table |
Date | |
Msg-id | CAFj8pRD+W3YcNm72+VbMXH1dzi0m4e1BHppfpwjiDP6nVVM8og@mail.gmail.com Whole thread Raw |
In response to | Re: proposal - function string_to_table (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: proposal - function string_to_table
|
List | pgsql-hackers |
Hi
čt 20. 8. 2020 v 4:07 odesílatel Peter Smith <smithpb2250@gmail.com> napsal:
Hi.
I have been looking at the patch: string_to_table-20200706-2.patch
Below are some review comments for your consideration.
====
COMMENT func.sgml (style)
+ <para>
+ splits string into table using supplied delimiter and
+ optional null string.
+ </para>
The format style of the short description is inconsistent with the
other functions.
e.g. Should start with Capital letter.
e.g. Should tag the parameter names properly
Something like:
<para>
Splits <parameter>string</parameter> into a table
using supplied <parameter>delimiter</parameter>
and optional null string <parameter>nullstr</parameter>.
</para>
done
====
COMMENT func.sgml (what does nullstr do)
The description does not sufficiently describe the purpose/behaviour
of the nullstr.
e.g. Firstly I thought that it meant if 2 consecutive delimiters were
encountered it would substitute this string as the row value. But it
is doing the opposite of what I guessed - if the extracted row value
is the same as nullstr then a NULL row is inserted instead.
done
====
COMMENT func.sgml (wrong sample output)
+<programlisting>xx
+yy,
+zz</programlisting>
This output is incorrect for the sample given. There is no "yy," in
the output because there is a 'yy' nullstr substitution.
Should be:
---
xx
NULL
zz
---
fixed
====
COMMENT func.sgml (related to regexp_split_to_table)
Because this new function is similar to the existing
regexp_split_to_table, perhaps they should cross-reference each other
so a reader of this documentation is made aware of the alternative
function?
I wrote new sentence with ref
====
COMMENT (test cases)
It is impossible to tell difference in the output between empty
strings and nulls currently, so maybe you can change all the tests to
have a form like below so they can be validated properly:
# select v, v IS NULL as "is null" from
string_to_table('a,b,*,c,d,',',','*') g(v);
v | is null
---+---------
a | f
b | f
| t
c | f
d | f
| f
(6 rows)
or maybe like this is even easier:
# select quote_nullable(string_to_table('a|*||c|d|','|','*'));
quote_nullable
----------------
'a'
NULL
''
'c'
'd'
''
(6 rows)
I prefer the first variant, it is clean. It is good idea, done
Something similar was already proposed before [1] but that never got
put into the test code.
[1] https://www.postgresql.org/message-id/CAFj8pRDSzDYmaS06dfMXBfbr8x%2B3xjDJxA5kbL3h8%2BeOGoRUcA%40mail.gmail.com
====
COMMENT (test cases)
There are multiple combinations of the parameters to this function and
MANY different results depending on different values they can take, so
the existing tests only cover a small sample.
I have attached a lot more test scenarios that you may want to include
for better test coverage. Everything seemed to work as expected.
ok, merged
PSA test-cases.pdf
====
COMMENT (accum_result)
+ Datum values[1];
+ bool nulls[1];
+
+ values[0] = PointerGetDatum(result_text);
+ nulls[0] = is_null;
Why not use variables instead of arrays with only 1 element?
Technically it is equivalent, but I think so using one element array is more correct, because function heap_form_tuple expects an array. Sure in C language there is no difference between pointer to value or pointer to array, but minimally the name of the argument "values" implies so argument is an array.
This pattern is used more times in Postgres. You can find a fragments where although we know so array has only one field, still we works with array
misc.c
hash.c
execTuples.c
but I can this code simplify little bit - I can use function tuplestore_putvalues(tupstore, tupdesc, values, nulls);
I see, so this code can be reduced more, and I don't need local variables, but I prefer to be consistent with other parts, and I feel better if I pass an array where the array is expected.
This is not extra important, and I can it change, just I think this variant is cleaner little bit
====
COMMENT (text_to_array_internal)
+ if (!tstate.astate)
+ PG_RETURN_ARRAYTYPE_P(construct_empty_array(TEXTOID));
Maybe the condition is more readable when expressed as:
if (tstate.astate == NULL)
done
new patch attached
Thank you for precious review
Regards
Pavel
====
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachment
pgsql-hackers by date: