Thread: add function argument names to regex* functions.

add function argument names to regex* functions.

From

jian he

Date:

27 December 2023, 16:53:13

Hi.
similar to [1], add function argument names to the following functions:
regexp_like, regexp_match,regexp_matches,regexp_replace,
regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count

so I call these function in a different notation[2], like:

SELECT regexp_like(string=>'a'||CHR(10)||'d', pattern=>'a.d', flags:='n');
select regexp_match(string=>'abc',n pattern=>'(B)(c)', flags=>'i');
select regexp_matches(string=>'Programmer', pattern=>'(\w)(.*?\1)',
flags=>'ig');
SELECT regexp_replace(source=>'A PostgreSQL function',
pattern=>'a|e|i|o|u', replacement=>'X', start=>1, n=>4, flags=>'i');
SELECT regexp_substr(string=>'1234567890',
pattern=>'(123)(4(56)(78))', start=>1, n=>1, flags=>'i', subexpr=>4);
SELECT regexp_split_to_array(string=>'thE QUick bROWn FOx jUMPs ovEr
The lazy dOG', pattern=>'e', flags=>'i');

SELECT foo, length(foo)
FROM regexp_split_to_table(string=>'thE QUick bROWn FOx jUMPs ovEr The
lazy dOG', pattern=>'e',flags=>'i') AS foo;
SELECT regexp_count(string=>'ABCABCABCABC', pattern=>'Abc', start=>1,
flags=>'i');

In [3], except the above mentioned function, there is a "substring" function.
I want to refactor substring function argument names. it looks like:
Schema | Name | Result data type | Argument data
types | Type
------------+-----------+------------------+--------------------------------------------+------
pg_catalog | substring | bit | bits bit, "from" integer
| func
pg_catalog | substring | bit | bits bit, "from" integer,
"for" integer | func
pg_catalog | substring | bytea | bytes bytea, "from"
integer | func
pg_catalog | substring | bytea | bytes bytea, "from"
integer, "for" integer | func
pg_catalog | substring | text | string text, "from"
integer | func
pg_catalog | substring | text | string text, "from"
integer, "for" integer | func
pg_catalog | substring | text | string text, pattern text
| func
pg_catalog | substring | text | text, text, text
| func
(8 rows)

As you can see, the substring function argument names need an explicit
double quote,
which doesn't look good, so I gave up.

[1]https://www.postgresql.org/message-id/flat/877cw3jl8y.fsf@wibble.ilmari.org
[2]https://www.postgresql.org/docs/current/sql-syntax-calling-funcs.html#SQL-SYNTAX-CALLING-FUNCS
[3] https://www.postgresql.org/docs/current/functions-matching.html

Attachment

v1-0001-add-fu.pat

Re: add function argument names to regex* functions.

From

Peter Eisentraut

Date:

27 December 2023, 22:25:05

On 27.12.23 17:53, jian he wrote:
> similar to [1], add function argument names to the following functions:
> regexp_like, regexp_match,regexp_matches,regexp_replace,
> regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count

Note that these functions are a quasi-standard that is shared with other 
SQL implementations.  It might be worth looking around if there are 
already other implementations of this idea.

Re: add function argument names to regex* functions.

From

jian he

Date:

28 December 2023, 03:28:42

On Thu, Dec 28, 2023 at 6:25 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 27.12.23 17:53, jian he wrote:
> > similar to [1], add function argument names to the following functions:
> > regexp_like, regexp_match,regexp_matches,regexp_replace,
> > regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count
>
> Note that these functions are a quasi-standard that is shared with other
> SQL implementations.  It might be worth looking around if there are
> already other implementations of this idea.
>

turns out people do like calling functions via explicitly mentioning
function argument names, example: [0]
There are no provisions for the argument names.

I looked around the oracle implementation in [1], and the oracle regex
related function argumentation name in [2]
I use the doc [3] syntax explanation and add the related function names.

Current, regex.* function syntax seems fine. but only parameter `N`
seems a little bit weird.
If we change the function's argument name, we also need to change
function syntax explanation in the doc; vise versa.

QUOTE:
The regexp_instr function returns the starting or ending position of
the N'th match of a POSIX regular expression pattern to a string, or
zero if there is no such match. It has the syntax regexp_instr(string,
pattern [, start [, N [, endoption [, flags [, subexpr ]]]]]). pattern
is searched for in string, normally from the beginning of the string,
but if the start parameter is provided then beginning from that
character index. If N is specified then the N'th match of the pattern
is located, otherwise the first match is located.
END OF QUOTE.

maybe we can change `N` to occurrence. but `occurrence` is kind of verbose.

[0] https://stackoverflow.com/questions/33387348/oracle-named-parameters-in-regular-functions
[1]
https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099
[2] https://dbfiddle.uk/h_SBDEKi
[3] https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP

Re: add function argument names to regex* functions.

From

"Dian Fay"

Date:

01 January 2024, 18:05:45

On Wed Dec 27, 2023 at 10:28 PM EST, jian he wrote:
> On Thu, Dec 28, 2023 at 6:25 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> >
> > On 27.12.23 17:53, jian he wrote:
> > > similar to [1], add function argument names to the following functions:
> > > regexp_like, regexp_match,regexp_matches,regexp_replace,
> > > regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count
> >
> > Note that these functions are a quasi-standard that is shared with other
> > SQL implementations.  It might be worth looking around if there are
> > already other implementations of this idea.
> >
>
> turns out people do like calling functions via explicitly mentioning
> function argument names, example: [0]
> There are no provisions for the argument names.
>
> I looked around the oracle implementation in [1], and the oracle regex
> related function argumentation name in [2]
> I use the doc [3] syntax explanation and add the related function names.
>
> Current, regex.* function syntax seems fine. but only parameter `N`
> seems a little bit weird.
> If we change the function's argument name, we also need to change
> function syntax explanation in the doc; vise versa.
>
> QUOTE:
> The regexp_instr function returns the starting or ending position of
> the N'th match of a POSIX regular expression pattern to a string, or
> zero if there is no such match. It has the syntax regexp_instr(string,
> pattern [, start [, N [, endoption [, flags [, subexpr ]]]]]). pattern
> is searched for in string, normally from the beginning of the string,
> but if the start parameter is provided then beginning from that
> character index. If N is specified then the N'th match of the pattern
> is located, otherwise the first match is located.
> END OF QUOTE.
>
> maybe we can change `N` to occurrence. but `occurrence` is kind of verbose.
>
> [0] https://stackoverflow.com/questions/33387348/oracle-named-parameters-in-regular-functions
> [1]
https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099
> [2] https://dbfiddle.uk/h_SBDEKi
> [3] https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP

I've been trying to use named arguments more diligently so expanding
support for built-in functions is welcome. The patch applies cleanly and
works as advertised.

I agree that the parameter name `n` is not ideal. For example, in
`regexp_replace` it's easy to misinterpret it as "make up to n
replacements". This has not been a problem when `n` only lives in the
documentation which explains exactly what it does, but that context is
not readily available in code expressing `n => 3`.

Another possibility is `index`, which is relatively short and not a
reserved keyword ^1. `position` is not as precise but would avoid the
conceptual overloading of ordinary indices.

1. https://www.postgresql.org/docs/current/sql-keywords-appendix.html

Re: add function argument names to regex* functions.

From

Peter Eisentraut

Date:

03 January 2024, 12:13:03

On 28.12.23 04:28, jian he wrote:
> I looked around the oracle implementation in [1], and the oracle regex
> related function argumentation name in [2]
> I use the doc [3] syntax explanation and add the related function names.
> 
> Current, regex.* function syntax seems fine. but only parameter `N`
> seems a little bit weird.
> If we change the function's argument name, we also need to change
> function syntax explanation in the doc; vise versa.

So, it looks like Oracle already has defined parameter names for these, 
so we should make ours match.

Re: add function argument names to regex* functions.

From

"Dian Fay"

Date:

03 January 2024, 23:05:47

> > Another possibility is `index`, which is relatively short and not a
> > reserved keyword ^1. `position` is not as precise but would avoid the
> > conceptual overloading of ordinary indices.
>
> I'm not a fan of "index" since that leaves the question of
> whether it's 0 or 1 based. "Position" is a bit better, but I think
> Jian's suggestion of "occurance" is best.

We do have precedent for one-based `index` in Postgres: array types are
1-indexed by default! "Occurrence" removes that ambiguity but it's long
and easy to misspell (I looked it up after typing it just now and it
_still_ feels off).

How's "instance"?

Re: add function argument names to regex* functions.

From

jian he

Date:

04 January 2024, 07:03:00

On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim.nasby@gmail.com> wrote:
>
> On 1/3/24 5:05 PM, Dian Fay wrote:
>
> Another possibility is `index`, which is relatively short and not a
> reserved keyword ^1. `position` is not as precise but would avoid the
> conceptual overloading of ordinary indices.
>
> I'm not a fan of "index" since that leaves the question of
> whether it's 0 or 1 based. "Position" is a bit better, but I think
> Jian's suggestion of "occurance" is best.
>
> We do have precedent for one-based `index` in Postgres: array types are
> 1-indexed by default! "Occurrence" removes that ambiguity but it's long
> and easy to misspell (I looked it up after typing it just now and it
> _still_ feels off).
>
> How's "instance"?
>
> Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so
presumablymisspelling wouldn't be a big issue. But I think "instance" is OK as well. 
>
> --
> Jim Nasby, Data Architect, Austin TX

regexp_instr: It has the syntax regexp_instr(string, pattern [, start
[, N [, endoption [, flags [, subexpr ]]]]])
oracle:
REGEXP_INSTR (source_char, pattern,  [, position [, occurrence [,
return_opt  [, match_param  [, subexpr ]]]]] )

"string" and "source_char" are almost the same descriptive, so maybe
there is no need to change.
"start" is better than "position", imho.
"return_opt" is better than "endoption", (maybe we need change, for
now I didn't)
"flags" cannot be changed to "match_param", given it quite everywhere
in functions-matching.html.

similarly for function regexp_replace, oracle using "repplace_string",
we use "replacement"(mentioned in the doc).
so I don't think we need to change to "repplace_string".

Based on how people google[0], I think `occurrence` is ok, even though
it's verbose.
to change from `N` to `occurrence`, we also need to change the doc,
that is why this patch is more larger.

[0]:
https://www.google.com/search?q=regex+nth+match&oq=regex+nth+match&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg8MgYIAhBFGDzSAQc2MThqMGo5qAIAsAIA&sourceid=chrome&ie=UTF-8

Attachment

v2-0001-add-function-argument-names-to-regex.-functions.patch

Re: add function argument names to regex* functions.

From

"Dian Fay"

Date:

08 January 2024, 00:44:55

On Thu Jan 4, 2024 at 2:03 AM EST, jian he wrote:
> On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim.nasby@gmail.com> wrote:
> >
> > On 1/3/24 5:05 PM, Dian Fay wrote:
> >
> > Another possibility is `index`, which is relatively short and not a
> > reserved keyword ^1. `position` is not as precise but would avoid the
> > conceptual overloading of ordinary indices.
> >
> > I'm not a fan of "index" since that leaves the question of
> > whether it's 0 or 1 based. "Position" is a bit better, but I think
> > Jian's suggestion of "occurance" is best.
> >
> > We do have precedent for one-based `index` in Postgres: array types are
> > 1-indexed by default! "Occurrence" removes that ambiguity but it's long
> > and easy to misspell (I looked it up after typing it just now and it
> > _still_ feels off).
> >
> > How's "instance"?
> >
> > Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so
presumablymisspelling wouldn't be a big issue. But I think "instance" is OK as well. 
> >
> > --
> > Jim Nasby, Data Architect, Austin TX
>
> regexp_instr: It has the syntax regexp_instr(string, pattern [, start
> [, N [, endoption [, flags [, subexpr ]]]]])
> oracle:
> REGEXP_INSTR (source_char, pattern,  [, position [, occurrence [,
> return_opt  [, match_param  [, subexpr ]]]]] )
>
> "string" and "source_char" are almost the same descriptive, so maybe
> there is no need to change.
> "start" is better than "position", imho.
> "return_opt" is better than "endoption", (maybe we need change, for
> now I didn't)
> "flags" cannot be changed to "match_param", given it quite everywhere
> in functions-matching.html.
>
> similarly for function regexp_replace, oracle using "repplace_string",
> we use "replacement"(mentioned in the doc).
> so I don't think we need to change to "repplace_string".
>
> Based on how people google[0], I think `occurrence` is ok, even though
> it's verbose.
> to change from `N` to `occurrence`, we also need to change the doc,
> that is why this patch is more larger.
>
>
> [0]:
https://www.google.com/search?q=regex+nth+match&oq=regex+nth+match&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg8MgYIAhBFGDzSAQc2MThqMGo5qAIAsAIA&sourceid=chrome&ie=UTF-8

The `regexp_replace` summary in table 9.10 is mismatched and still
specifies the first parameter name as `string` instead of `source`.
Since all the other functions use `string`, should `regexp_replace` do
the same or is this a case where an established "standard" diverges?

I noticed the original documentation for some of these functions is
rather disorganized; summaries explain `occurrence` without explaining
the prior `start` parameter, and detailed documentation in 9.7 is
usually a single paragraph per function running pell-mell through ifs
and buts without section headings, so entries in table 9.10 have to
reference the entire section 9.7.3 instead of their specific functions.
It's out of scope here, but should I bring this up on pgsql-docs?

Re: add function argument names to regex* functions.

From

jian he

Date:

08 January 2024, 14:26:34

On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote:
>
> On Thu Jan 4, 2024 at 2:03 AM EST, jian he wrote:
> > On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim.nasby@gmail.com> wrote:
> > >
> > > On 1/3/24 5:05 PM, Dian Fay wrote:
> > >
> > > Another possibility is `index`, which is relatively short and not a
> > > reserved keyword ^1. `position` is not as precise but would avoid the
> > > conceptual overloading of ordinary indices.
> > >
> > > I'm not a fan of "index" since that leaves the question of
> > > whether it's 0 or 1 based. "Position" is a bit better, but I think
> > > Jian's suggestion of "occurance" is best.
> > >
> > > We do have precedent for one-based `index` in Postgres: array types are
> > > 1-indexed by default! "Occurrence" removes that ambiguity but it's long
> > > and easy to misspell (I looked it up after typing it just now and it
> > > _still_ feels off).
> > >
> > > How's "instance"?
> > >
> > > Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so
presumablymisspelling wouldn't be a big issue. But I think "instance" is OK as well. 
> > >
> > > --
> > > Jim Nasby, Data Architect, Austin TX
> >
> > regexp_instr: It has the syntax regexp_instr(string, pattern [, start
> > [, N [, endoption [, flags [, subexpr ]]]]])
> > oracle:
> > REGEXP_INSTR (source_char, pattern,  [, position [, occurrence [,
> > return_opt  [, match_param  [, subexpr ]]]]] )
> >
> > "string" and "source_char" are almost the same descriptive, so maybe
> > there is no need to change.
> > "start" is better than "position", imho.
> > "return_opt" is better than "endoption", (maybe we need change, for
> > now I didn't)
> > "flags" cannot be changed to "match_param", given it quite everywhere
> > in functions-matching.html.
> >
> > similarly for function regexp_replace, oracle using "repplace_string",
> > we use "replacement"(mentioned in the doc).
> > so I don't think we need to change to "repplace_string".
> >
> > Based on how people google[0], I think `occurrence` is ok, even though
> > it's verbose.
> > to change from `N` to `occurrence`, we also need to change the doc,
> > that is why this patch is more larger.
> >
> >
> > [0]:
https://www.google.com/search?q=regex+nth+match&oq=regex+nth+match&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg8MgYIAhBFGDzSAQc2MThqMGo5qAIAsAIA&sourceid=chrome&ie=UTF-8
>
> The `regexp_replace` summary in table 9.10 is mismatched and still
> specifies the first parameter name as `string` instead of `source`.
> Since all the other functions use `string`, should `regexp_replace` do
> the same or is this a case where an established "standard" diverges?
>

got it. Thanks for pointing it out.

in functions-matching.html
if I change <replaceable>source</replaceable> to
<replaceable>string</replaceable> then
there are no markup "string" and markup "string", it's kind of
slightly confusing.

So does the following refactored description of regexp_replace make sense:

     The <replaceable>string</replaceable> is returned unchanged if
     there is no match to the <replaceable>pattern</replaceable>.  If there is a
     match, the <replaceable>string</replaceable> is returned with the
     <replaceable>replacement</replaceable> string substituted for the matching
     substring.  The <replaceable>replacement</replaceable> string can contain
     <literal>\</literal><replaceable>n</replaceable>, where
<replaceable>n</replaceable> is 1
     through 9, to indicate that the source substring matching the
     <replaceable>n</replaceable>'th parenthesized subexpression of
the pattern should be
     inserted, and it can contain <literal>\&</literal> to indicate that the
     substring matching the entire pattern should be inserted.  Write
     <literal>\\</literal> if you need to put a literal backslash in
the replacement
     text.

> I noticed the original documentation for some of these functions is
> rather disorganized; summaries explain `occurrence` without explaining
> the prior `start` parameter, and detailed documentation in 9.7 is
> usually a single paragraph per function running pell-mell through ifs
> and buts without section headings, so entries in table 9.10 have to
> reference the entire section 9.7.3 instead of their specific functions.
> It's out of scope here, but should I bring this up on pgsql-docs?

I got it.
in Table 9.10. Other String Functions and Operators, if we can
reference the specific function would be great.
As for now, in the browser, you need to use Ctrl+F to find the
detailed explanation in 9.7.3.
you can just bring your suggested or patch to pgsql-hackers@postgresql.org.

Re: add function argument names to regex* functions.

From

"Dian Fay"

Date:

09 January 2024, 00:51:58

On Mon Jan 8, 2024 at 9:26 AM EST, jian he wrote:
> On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote:
> > The `regexp_replace` summary in table 9.10 is mismatched and still
> > specifies the first parameter name as `string` instead of `source`.
> > Since all the other functions use `string`, should `regexp_replace` do
> > the same or is this a case where an established "standard" diverges?
>
> got it. Thanks for pointing it out.
>
> in functions-matching.html
> if I change <replaceable>source</replaceable> to
> <replaceable>string</replaceable> then
> there are no markup "string" and markup "string", it's kind of
> slightly confusing.
>
> So does the following refactored description of regexp_replace make sense:
>
>      The <replaceable>string</replaceable> is returned unchanged if
>      there is no match to the <replaceable>pattern</replaceable>.  If there is a
>      match, the <replaceable>string</replaceable> is returned with the
>      <replaceable>replacement</replaceable> string substituted for the matching
>      substring.  The <replaceable>replacement</replaceable> string can contain
>      <literal>\</literal><replaceable>n</replaceable>, where
> <replaceable>n</replaceable> is 1
>      through 9, to indicate that the source substring matching the
>      <replaceable>n</replaceable>'th parenthesized subexpression of
> the pattern should be
>      inserted, and it can contain <literal>\&</literal> to indicate that the
>      substring matching the entire pattern should be inserted.  Write
>      <literal>\\</literal> if you need to put a literal backslash in
> the replacement
>      text.

That change makes sense to me! I'll see about the section refactoring
after this lands.

Re: add function argument names to regex* functions.

From

jian he

Date:

10 January 2024, 14:18:00

On Tue, Jan 9, 2024 at 8:52 AM Dian Fay <di@nmfay.com> wrote:
>
> On Mon Jan 8, 2024 at 9:26 AM EST, jian he wrote:
> > On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote:
> > > The `regexp_replace` summary in table 9.10 is mismatched and still
> > > specifies the first parameter name as `string` instead of `source`.
> > > Since all the other functions use `string`, should `regexp_replace` do
> > > the same or is this a case where an established "standard" diverges?
> >
> > got it. Thanks for pointing it out.
> >
> > in functions-matching.html
> > if I change <replaceable>source</replaceable> to
> > <replaceable>string</replaceable> then
> > there are no markup "string" and markup "string", it's kind of
> > slightly confusing.
> >
> > So does the following refactored description of regexp_replace make sense:
> >
> >      The <replaceable>string</replaceable> is returned unchanged if
> >      there is no match to the <replaceable>pattern</replaceable>.  If there is a
> >      match, the <replaceable>string</replaceable> is returned with the
> >      <replaceable>replacement</replaceable> string substituted for the matching
> >      substring.  The <replaceable>replacement</replaceable> string can contain
> >      <literal>\</literal><replaceable>n</replaceable>, where
> > <replaceable>n</replaceable> is 1
> >      through 9, to indicate that the source substring matching the
> >      <replaceable>n</replaceable>'th parenthesized subexpression of
> > the pattern should be
> >      inserted, and it can contain <literal>\&</literal> to indicate that the
> >      substring matching the entire pattern should be inserted.  Write
> >      <literal>\\</literal> if you need to put a literal backslash in
> > the replacement
> >      text.
>
> That change makes sense to me! I'll see about the section refactoring
> after this lands.

I put the changes into the new patch.

Attachment

v3-0001-add-function-argument-names-to-regex.-functions.patch

Re: add function argument names to regex* functions.

From

"Dian Fay"

Date:

11 January 2024, 03:57:33

On Wed Jan 10, 2024 at 9:18 AM EST, jian he wrote:
> On Tue, Jan 9, 2024 at 8:52 AM Dian Fay <di@nmfay.com> wrote:
> >
> > On Mon Jan 8, 2024 at 9:26 AM EST, jian he wrote:
> > > On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote:
> > > > The `regexp_replace` summary in table 9.10 is mismatched and still
> > > > specifies the first parameter name as `string` instead of `source`.
> > > > Since all the other functions use `string`, should `regexp_replace` do
> > > > the same or is this a case where an established "standard" diverges?
> > >
> > > got it. Thanks for pointing it out.
> > >
> > > in functions-matching.html
> > > if I change <replaceable>source</replaceable> to
> > > <replaceable>string</replaceable> then
> > > there are no markup "string" and markup "string", it's kind of
> > > slightly confusing.
> > >
> > > So does the following refactored description of regexp_replace make sense:
> > >
> > >      The <replaceable>string</replaceable> is returned unchanged if
> > >      there is no match to the <replaceable>pattern</replaceable>.  If there is a
> > >      match, the <replaceable>string</replaceable> is returned with the
> > >      <replaceable>replacement</replaceable> string substituted for the matching
> > >      substring.  The <replaceable>replacement</replaceable> string can contain
> > >      <literal>\</literal><replaceable>n</replaceable>, where
> > > <replaceable>n</replaceable> is 1
> > >      through 9, to indicate that the source substring matching the
> > >      <replaceable>n</replaceable>'th parenthesized subexpression of
> > > the pattern should be
> > >      inserted, and it can contain <literal>\&</literal> to indicate that the
> > >      substring matching the entire pattern should be inserted.  Write
> > >      <literal>\\</literal> if you need to put a literal backslash in
> > > the replacement
> > >      text.
> >
> > That change makes sense to me! I'll see about the section refactoring
> > after this lands.
>
> I put the changes into the new patch.

Sorry, I missed one minor issue with v2. The replacement on lines
6027-6028 of func.sgml (originally "`n` rows if there are `n` matches")
is not needed and could be more confusing since the `n` represents a
number, not an argument to `regexp_matches`. I've built v3 and gone over
everything else one more time and it looks good.

Re: add function argument names to regex* functions.

From

Peter Eisentraut

Date:

18 January 2024, 08:17:11

On 10.01.24 15:18, jian he wrote:
> I put the changes into the new patch.

Reading back through the discussion, I wasn't quite able to interpret 
the resolution regarding Oracle compatibility.  From the patch, it looks 
like you chose not to adopt the parameter names from Oracle.  Was that 
your intention?

Re: add function argument names to regex* functions.

From

jian he

Date:

20 January 2024, 02:55:41

On Thu, Jan 18, 2024 at 4:17 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 10.01.24 15:18, jian he wrote:
> > I put the changes into the new patch.
>
> Reading back through the discussion, I wasn't quite able to interpret
> the resolution regarding Oracle compatibility.  From the patch, it looks
> like you chose not to adopt the parameter names from Oracle.  Was that
> your intention?
>

per committee message:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=6424337073589476303b10f6d7cc74f501b8d9d7
Even if the names are all the same, our function is still not the same
as oracle.

There is a documentation bug.
In [0], Table 9.25. Regular Expression Functions Equivalencies
regexp_replace function definition: regexp_replace(string, pattern, replacement)

In one of the <tip> section below, regexp_replace explains as
<<<<<
The regexp_replace function provides substitution of new text for
substrings that match POSIX regular expression patterns. It has the
syntax regexp_replace(source, pattern, replacement [, start [, N ]] [,
flags ]). (Notice that N cannot be specified unless start is, but
flags can be given in any case.)
<<<<<
So I changed the first argument of regexp_replace to "string". So
accordingly, the doc needs to change also, which I did.

another regex* function argument changes: from "N" to "occurences",  example:
    + If <replaceable>occurrence</replaceable> is specified
    + then the <replaceable>occurrence</replaceable>'th match of the pattern
    + is located,

but [2] says
Speaking of the "occurrence'th
occurrence" is just silly, not to mention long and easy to misspell."

summary:
adding function-named notation is my intention.
To make regex.* functions named-notation works, we need to add
proargnames to src/include/catalog/pg_proc.dat.
add proargnames also require changing the doc.
naming proargnames is a matter of taste now, So I only change 'N' to
'occurrence'.

[0] https://www.postgresql.org/docs/current/functions-matching.html
[1] https://www.postgresql.org/message-id/flat/fc160ee0-c843-b024-29bb-97b5da61971f%40darold.net

Re: add function argument names to regex* functions.

From

jian he

Date:

20 January 2024, 03:03:02

On Sat, Jan 20, 2024 at 10:55 AM jian he <jian.universality@gmail.com> wrote:
>
>
> another regex* function argument changes: from "N" to "occurences",  example:
>     + If <replaceable>occurrence</replaceable> is specified
>     + then the <replaceable>occurrence</replaceable>'th match of the pattern
>     + is located,
>
> but [2] says
> Speaking of the "occurrence'th
> occurrence" is just silly, not to mention long and easy to misspell."
>

sorry.
[2], The reference link is
https://www.postgresql.org/message-id/1567465.1627860115%40sss.pgh.pa.us

my previous post will link to the whole thread.

Re: add function argument names to regex* functions.

From

Tom Lane

Date:

02 April 2024, 20:45:33

jian he <jian.universality@gmail.com> writes:
> On Thu, Jan 18, 2024 at 4:17 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>> Reading back through the discussion, I wasn't quite able to interpret
>> the resolution regarding Oracle compatibility.  From the patch, it looks
>> like you chose not to adopt the parameter names from Oracle.  Was that
>> your intention?

> per committee message:
> https://git.postgresql.org/cgit/postgresql.git/commit/?id=6424337073589476303b10f6d7cc74f501b8d9d7
> Even if the names are all the same, our function is still not the same
> as oracle.

The fact that there's minor discrepancies in the regex languages
doesn't seem to me to have a lot of bearing on whether we should
follow Oracle's choices of parameter names.

However, if we do follow Oracle, it seems like we should do that
consistently, which this patch doesn't.  For instance, per [1]
Oracle calls the arguments of regex_substr

    source_char,
    pattern,
    position,
    occurrence,
    match_param,
    subexpr

while we have

    string,
    pattern,
    start,
    N,
    flags,
    subexpr

The patch proposes to replace "N" with "occurrence" but not touch
the other discrepancies, which seems to me to be a pretty poor
choice.  "occurrence" is very long and difficult to spell correctly,
and if you're not following Oracle slavishly, exactly what is the
argument in its favor?  I quite agree that Oracle's other choices
aren't improvements over ours, but neither is that one.

On the whole my inclination would be to stick to the names we have
in the documentation.  There might be an argument for changing "N"
to something lower-case so you don't have to quote it; but if we do,
I'd go for, say, "count".

            regards, tom lane

[1]
https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099

Re: add function argument names to regex* functions.

From

jian he

Date:

04 April 2024, 13:54:53

On Wed, Apr 3, 2024 at 4:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> jian he <jian.universality@gmail.com> writes:
> > On Thu, Jan 18, 2024 at 4:17 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> >> Reading back through the discussion, I wasn't quite able to interpret
> >> the resolution regarding Oracle compatibility.  From the patch, it looks
> >> like you chose not to adopt the parameter names from Oracle.  Was that
> >> your intention?
>
> > per committee message:
> > https://git.postgresql.org/cgit/postgresql.git/commit/?id=6424337073589476303b10f6d7cc74f501b8d9d7
> > Even if the names are all the same, our function is still not the same
> > as oracle.
>
> The fact that there's minor discrepancies in the regex languages
> doesn't seem to me to have a lot of bearing on whether we should
> follow Oracle's choices of parameter names.
>
> However, if we do follow Oracle, it seems like we should do that
> consistently, which this patch doesn't.  For instance, per [1]
> Oracle calls the arguments of regex_substr
>
>         source_char,
>         pattern,
>         position,
>         occurrence,
>         match_param,
>         subexpr
>
> while we have
>
>         string,
>         pattern,
>         start,
>         N,
>         flags,
>         subexpr
>
> The patch proposes to replace "N" with "occurrence" but not touch
> the other discrepancies, which seems to me to be a pretty poor
> choice.  "occurrence" is very long and difficult to spell correctly,
> and if you're not following Oracle slavishly, exactly what is the
> argument in its favor?  I quite agree that Oracle's other choices
> aren't improvements over ours, but neither is that one.
>
> On the whole my inclination would be to stick to the names we have
> in the documentation.  There might be an argument for changing "N"
> to something lower-case so you don't have to quote it; but if we do,
> I'd go for, say, "count".
>

we have
---------------------------------------------------------------
The replacement string can contain \n, where n is 1 through 9, to
indicate that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted.
----------------------------------------------------------------------------
in the regexp_replace explanation section.
changing "N" to lower-case would be misleading for regexp_replace?
so I choose "count".

By the way, I think the above  is so hard to comprehend.
I can only find related test in src/test/regress/sql/strings.sql are:
SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})',
E'(\\1) \\2-\\3');
SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\&Y', 'g');
SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\\\Y', 'g');

but these tests seem not friendly.
maybe we should have some simple examples to demonstrate the above paragraph.

Attachment

v4-0001-add-regex-functions-argument-names.patch

Re: add function argument names to regex* functions.

From

Robert Haas

Date:

15 May 2024, 18:46:10

On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com> wrote:
> in the regexp_replace explanation section.
> changing "N" to lower-case would be misleading for regexp_replace?
> so I choose "count".

I don't see why that would be confusing for regexp_replace
specifically, but I think N => count is a reasonable change to make.
However, I don't think this quite works:

+     then the <replaceable>count</replaceable>'th match of the pattern

An English speaker is more likely to understand what is meant by
"N'th" than what is meant by "count'th". Even if they can guess, it's
kinda strange-looking. I think it needs to be rephrased somehow, but
I'm not sure exactly how.

> By the way, I think the above  is so hard to comprehend.
> I can only find related test in src/test/regress/sql/strings.sql are:
> SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})',
> E'(\\1) \\2-\\3');
> SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\&Y', 'g');
> SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\\\Y', 'g');
>
> but these tests seem not friendly.
> maybe we should have some simple examples to demonstrate the above paragraph.

Examples in the regression tests aren't meant as tests, not examples
for users to copy. If we want examples, those belong in the
documentation. However, I see that regexp_replace already has some
examples at https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
so I'm not sure exactly what you think should be added.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: add function argument names to regex* functions.

From

Robert Haas

Date:

15 May 2024, 18:50:02

On Wed, May 15, 2024 at 2:46 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Examples in the regression tests aren't meant as tests, not examples
> for users to copy. If we want examples, those belong in the
> documentation. However, I see that regexp_replace already has some
> examples at https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
> so I'm not sure exactly what you think should be added.

Woops. I should have said: Examples in the regression tests *are*
meant as tests...

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: add function argument names to regex* functions.

From

"David G. Johnston"

Date:

15 May 2024, 19:01:18

On Wed, May 15, 2024 at 11:46 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com> wrote:
> in the regexp_replace explanation section.
> changing "N" to lower-case would be misleading for regexp_replace?
> so I choose "count".

I don't see why that would be confusing for regexp_replace
specifically, but I think N => count is a reasonable change to make.
However, I don't think this quite works:

+ then the <replaceable>count</replaceable>'th match of the pattern

An English speaker is more likely to understand what is meant by
"N'th" than what is meant by "count'th". Even if they can guess, it's
kinda strange-looking. I think it needs to be rephrased somehow, but
I'm not sure exactly how.

I think this confusion goes to show that replacing N with count doesn't work.

"replace_at" comes to mind as a better name.

By default, only the first match of the pattern is replaced. If replace_at is specified and greater than zero, then the first "replace_at - 1" matches are skipped before making a single replacement (i.e., the g flag is ignored when replace_at is specified.)

David J.

Re: add function argument names to regex* functions.

From

Robert Haas

Date:

15 May 2024, 19:07:15

On Wed, May 15, 2024 at 3:01 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> I think this confusion goes to show that replacing N with count doesn't work.
>
> "replace_at" comes to mind as a better name.

I do not agree with that at all. It shows that a literal
search-and-replace changing N to count does not work, but it does not
show that count is a bad name for the concept, and I don't think it
is. I believe that if I were reading the documentation, count would be
clearer to me than N, N would probably still be clear enough, and
replace_at wouldn't be clear at all. I'd expect replace_at to be a
character position or something, not an occurrence count.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: add function argument names to regex* functions.

From

Tom Lane

Date:

15 May 2024, 19:10:08

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com> wrote:
>> changing "N" to lower-case would be misleading for regexp_replace?
>> so I choose "count".

> I don't see why that would be confusing for regexp_replace
> specifically, but I think N => count is a reasonable change to make.
> However, I don't think this quite works:
> +     then the <replaceable>count</replaceable>'th match of the pattern

I think the origin of the problem here is not wanting to use "N"
as the actual name of the parameter, because then users would have
to double-quote it to write "regexp_replace(..., "N" => 42, ...)".

However ... is that really so awful?  It's still fewer keystrokes
than "count".  It's certainly a potential gotcha for users who've
not internalized when they need double quotes, but I think we
could largely address that problem just by making sure to provide
a documentation example that shows use of "N".

> An English speaker is more likely to understand what is meant by
> "N'th" than what is meant by "count'th".

+1 ... none of the proposals make that bit read more clearly
than it does now.

            regards, tom lane

Re: add function argument names to regex* functions.

From

Chapman Flack

Date:

15 May 2024, 19:22:59

On 05/15/24 15:07, Robert Haas wrote:
> is. I believe that if I were reading the documentation, count would be
> clearer to me than N, N would probably still be clear enough, and
> replace_at wouldn't be clear at all. I'd expect replace_at to be a
> character position or something, not an occurrence count.

You've said the magic word. In the analogous (but XQuery-based)
ISO standard regex functions, the argument that does that is identified
with the keyword OCCURRENCE.

What would be wrong with that, for consistency's sake?

Regards,
-Chap

Re: add function argument names to regex* functions.

From

"David G. Johnston"

Date:

15 May 2024, 19:24:22

On Wed, May 15, 2024 at 12:07 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, May 15, 2024 at 3:01 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> I think this confusion goes to show that replacing N with count doesn't work.
>
> "replace_at" comes to mind as a better name.

I do not agree with that at all. It shows that a literal
search-and-replace changing N to count does not work, but it does not
show that count is a bad name for the concept, and I don't think it
is. I believe that if I were reading the documentation, count would be
clearer to me than N, N would probably still be clear enough, and
replace_at wouldn't be clear at all. I'd expect replace_at to be a
character position or something, not an occurrence count.

The function replaces matches, not random characters. And if you are reading the documentation I find it implausible that the wording I suggested would cause one to think in terms of characters instead of matches.

If I choose not to read the documentation "count" seems like it behaves as a qualified "g". I don't want all matches replaced, I want the first "count" matches only replaced.

"occurrence" probably is the best choice but I agree the spelling issues are a big negative.

count - how many things there are. This isn't a count. I'd rather stick with N, at least it actually has the desired meaning as a pointer to an item in a list.

N - The label provides zero context as to what the number you place there is going to be used for. Labels ideally do more work than this especially if someone takes the time to spell them out. Otherwise why use "pattern" instead of "p".

David J.

Re: add function argument names to regex* functions.

From

Robert Haas

Date:

15 May 2024, 19:31:56

On Wed, May 15, 2024 at 3:23 PM Chapman Flack <jcflack@acm.org> wrote:
> What would be wrong with that, for consistency's sake?

It was proposed and rejected upthread, but that's not to say that I
necessarily endorse the reasons given.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: add function argument names to regex* functions.

From

Robert Haas

Date:

15 May 2024, 19:52:42

On Wed, May 15, 2024 at 3:25 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> The function replaces matches, not random characters.  And if you are reading the documentation I find it implausible
thatthe wording I suggested would cause one to think in terms of characters instead of matches. 

I mean I just told you what my reaction to it was. If you find that
reaction "implausible" then I guess you think I was lying when I said
that?

> N - The label provides zero context as to what the number you place there is going to be used for.  Labels ideally do
morework than this especially if someone takes the time to spell them out.  Otherwise why use "pattern" instead of "p". 

I feel like you're attacking a straw man here. I never said that N was
my first choice; in fact, I said the opposite. But I do think that if
the documentation says, as it does, that the function is
regexp_replace(source, pattern, replacement, start, N, flags), a
reader who has some idea what a function called regexp_replace might
do will probably be able to guess what N is. It's probably also true
that if we changed "pattern" to "p" they would still be able to guess
that too, because there's nothing other than a pattern that you'd
expect to pass to a regexp-replacement function that starts with p,
but it would still be worse than what we have now.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: add function argument names to regex* functions.

From

"David G. Johnston"

Date:

15 May 2024, 20:12:57

On Wed, May 15, 2024 at 12:52 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, May 15, 2024 at 3:25 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> The function replaces matches, not random characters. And if you are reading the documentation I find it implausible that the wording I suggested would cause one to think in terms of characters instead of matches.

I mean I just told you what my reaction to it was. If you find that
reaction "implausible" then I guess you think I was lying when I said
that?

You just broke my brain when you say that you read:

And then say:

I'd expect replace_at to be a character position or something, not an occurrence count.

I guess it isn't a claim you are lying, rather I simply don't follow your mental model of all this and in my mental model behind the proposal I don't believe the typical reader will become confused on that point. I guess that means I don't find you to be the typical reader, at least so far as this specific topic goes. But hey, maybe I'm the one in the minority. In either case we disagree and that was my main point.

David J.

Re: add function argument names to regex* functions.

From

"David G. Johnston"

Date:

15 May 2024, 20:17:57

On Wed, May 15, 2024 at 12:07 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, May 15, 2024 at 3:01 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> I think this confusion goes to show that replacing N with count doesn't work.
>
> "replace_at" comes to mind as a better name.
I'd expect replace_at to be a
character position or something, not an occurrence count.

I'll amend the name to: "replace_match"

I do now see that since the immediately preceding parameter, "start", deals with characters instead of matches that making it clear this parameter deals in matches in the name work. The singular 'match' has all the same benefits as 'at' plus this point of clarity.

David J.

Re: add function argument names to regex* functions.

From

Robert Haas

Date:

15 May 2024, 20:18:53

On Wed, May 15, 2024 at 4:13 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> You just broke my brain when you say that you read:
>
> By default, only the first match of the pattern is replaced.  If replace_at is specified and greater than zero, then
thefirst "replace_at - 1" matches are skipped before making a single replacement (i.e., the g flag is ignored when
replace_atis specified.) 
>
> And then say:
>
> I'd expect replace_at to be a character position or something, not an occurrence count.

Ah. What I meant was: if I just saw the parameter name, and not the
documentation, I believe that I would not correctly understand what it
did. I would have had to read the docs. Whereas I'm pretty sure at
some point years ago, I looked up these functions and I saw "N", and I
did understand what that did without needing it explained. If I had
seen "count" or "occurrence" I think I would have understood that
without further explanation, too.

So my point was: to me, N is more self-documenting than replace_at,
and less self-documenting than count or occurrence.

If your mileage varies on that point, so be it!

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: add function argument names to regex* functions.

From

"David G. Johnston"

Date:

15 May 2024, 20:24:40

On Wed, May 15, 2024 at 1:19 PM Robert Haas <robertmhaas@gmail.com> wrote:

So my point was: to me, N is more self-documenting than replace_at,
and less self-documenting than count or occurrence.

If your mileage varies on that point, so be it!

Maybe just "match" instead of "replace_match".

Reading this it strikes me that any of these parameter names can and probably should be read as having "replace" in front of them:

replace N

replace count

replace occurrence

replace match

Saying replace becomes redundant:

replace replace at

replace replace match

David J.

Re: add function argument names to regex* functions.

From

Chapman Flack

Date:

15 May 2024, 20:28:35

On 05/15/24 15:31, Robert Haas wrote:
> On Wed, May 15, 2024 at 3:23 PM Chapman Flack <jcflack@acm.org> wrote:
>> What would be wrong with [occurrence], for consistency's sake?
>
> It was proposed and rejected upthread, but that's not to say that I
> necessarily endorse the reasons given.

Apologies for not having read far enough up the thread before replying.

Having done so now, I guess I'd just offer one small point: the upthread
discussion did mention that 'occurrence' was used by Oracle, and asked
"if you're not following Oracle slavishly, exactly what is the argument
in its favor?".

Nothing else upthread seems to have mentioned that OCCURRENCE is the
exact keyword used in ISO SQL for the analogous argument in analogous
functions. Maybe that won't have any effect on the outcome either, but
it does seem worth getting into the thread.

Regards,
-Chap

Re: add function argument names to regex* functions.

From

Robert Haas

Date:

15 May 2024, 20:48:12

On Wed, May 15, 2024 at 4:25 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Maybe just "match" instead of "replace_match".

Well, this is just turning into a bikeshedding exercise at this point.
We can generate names for this parameter all day long, but a bunch of
names none of which gets more than one vote is not really helping
anything.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: add function argument names to regex* functions.

From

jian he

Date:

15 July 2024, 12:02:00

On Thu, May 16, 2024 at 3:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com> wrote:
> >> changing "N" to lower-case would be misleading for regexp_replace?
> >> so I choose "count".
>
> > I don't see why that would be confusing for regexp_replace
> > specifically, but I think N => count is a reasonable change to make.
> > However, I don't think this quite works:
> > +     then the <replaceable>count</replaceable>'th match of the pattern
>
> I think the origin of the problem here is not wanting to use "N"
> as the actual name of the parameter, because then users would have
> to double-quote it to write "regexp_replace(..., "N" => 42, ...)".
>
> However ... is that really so awful?  It's still fewer keystrokes
> than "count".  It's certainly a potential gotcha for users who've
> not internalized when they need double quotes, but I think we
> could largely address that problem just by making sure to provide
> a documentation example that shows use of "N".

done it this way. patch attached.

last example from

regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i')
                                   A PostgrXSQL function

change to

regexp_replace(string=>'A PostgreSQL function', pattern=>'a|e|i|o|u',
replacement=>'X',start=>1, "N"=>3, flags=>'i');
                                   A PostgrXSQL function

but I am not 100% sure
                                    <lineannotation>A PostgrXSQL
function</lineannotation>
is in the right position.


also address Chapman Flack point:
correct me if i am wrong, but i don't think the ISO standard mandates
function argument names.
So we can choose the best function argument name for our purpose?

Attachment

v5-0001-add-regex-functions-argument-names-to-pg_proc.patch

Re: add function argument names to regex* functions.

From

Chapman Flack

Date:

15 July 2024, 14:46:06

On 07/15/24 08:02, jian he wrote:
> also address Chapman Flack point:
> correct me if i am wrong, but i don't think the ISO standard mandates
> function argument names.
> So we can choose the best function argument name for our purpose?

Ah, I may have mistaken which functions the patch meant to apply to.

These being the non-ISO regexp_* functions using POSIX expressions,
the ISO standard indeed says nothing about them.

In the ISO standard *_regex "functions", there are not really "function
argument names" mandated, because, like so many things in ISO SQL, they
have their own special syntax instead of being generic function calls:

TRANSLATE_REGEX('a|e|i|o|u' FLAG 'i' IN 'A PostgreSQL function'
 WITH 'X' FROM 1 OCCURRENCE 3);

Any choice to use similar argument names in the regexp_* functions would
be a matter of consistency with the analogous ISO functions, not anything
mandated.

Regards,
-Chap

Re: add function argument names to regex* functions.

From

Chapman Flack

Date:

15 July 2024, 14:54:55

On 07/15/24 10:46, Chapman Flack wrote:
> Ah, I may have mistaken which functions the patch meant to apply to.
> ...
> Any choice to use similar argument names in the regexp_* functions would
> be a matter of consistency with the analogous ISO functions, not anything
> mandated.

Or, looking back, I might have realized these were the non-ISO regexp_*
functions, but seen there was bikeshedding happening over the best name
to use for the occurrence argument, and merely suggested ISO's choice
OCCURRENCE for the analogous ISO functions, as a possible bikeshed
accelerator.

Regards,
-Chap

Re: add function argument names to regex* functions.

From

Tom Lane

Date:

18 July 2024, 21:48:24

jian he <jian.universality@gmail.com> writes:
> [ v5-0001-add-regex-functions-argument-names-to-pg_proc.patch ]

I'm not sure whether we've bikeshedded this to death yet, but
personally I'm content with the naming choices here (which basically
are those already shown in table 9.10).  However, while looking
at the patch I noticed a couple of issues, one small, the other
a bit bigger.

The small issue is that table 9.10 offers this syntax diagram
for regexp_replace:

regexp_replace ( string text, pattern text, replacement text [, start integer ] [, flags text ] ) → text

This implies that it's valid to write

    regexp_replace (string, pattern, replacement, start, flags)

but it is not: we have no function matching that signature.  I'm not
in a hurry to add one, either, for fear of ambiguity against the other
regexp_replace signature.  I think this needs to be broken into two
syntax diagrams:

regexp_replace ( string text, pattern text, replacement text [, start integer ] ) → text
regexp_replace ( string text, pattern text, replacement text [, flags text ] ) → text

The larger issue is that contrib/citext offers versions of some of
these functions that are meant to be drop-in replacements using
citext input.  Hence, we need to add the same parameter names to
those functions, or they'll fail to replace some calls.

            regards, tom lane

Re: add function argument names to regex* functions.

From

jian he

Date:

19 July 2024, 05:51:39

On Fri, Jul 19, 2024 at 5:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> jian he <jian.universality@gmail.com> writes:
> > [ v5-0001-add-regex-functions-argument-names-to-pg_proc.patch ]
>
> I'm not sure whether we've bikeshedded this to death yet, but
> personally I'm content with the naming choices here (which basically
> are those already shown in table 9.10).  However, while looking
> at the patch I noticed a couple of issues, one small, the other
> a bit bigger.
>
> The small issue is that table 9.10 offers this syntax diagram
> for regexp_replace:
>
> regexp_replace ( string text, pattern text, replacement text [, start integer ] [, flags text ] ) → text
>
> This implies that it's valid to write
>
>         regexp_replace (string, pattern, replacement, start, flags)
>
> but it is not: we have no function matching that signature.  I'm not
> in a hurry to add one, either, for fear of ambiguity against the other
> regexp_replace signature.  I think this needs to be broken into two
> syntax diagrams:
>
> regexp_replace ( string text, pattern text, replacement text [, start integer ] ) → text
> regexp_replace ( string text, pattern text, replacement text [, flags text ] ) → text

We can list  them separately.
regexp_replace(string, pattern, replacement [, start])
regexp_replace(string, pattern, replacement [, flags])
regexp_replace(string, pattern, replacement , start , N [, flags ]).

if both optional is not there then they are the same, list 2 potential
identical functions separately seems wrong?
so i choose 2 bracket with a vertical bar:

regexp_replace(string, pattern, replacement [[, start] | [, flags]]).

maybe less readable.

> The larger issue is that contrib/citext offers versions of some of
> these functions that are meant to be drop-in replacements using
> citext input.  Hence, we need to add the same parameter names to
> those functions, or they'll fail to replace some calls.
>

I first wanted to use alterfunction solve this, then found out it cannot,
later I found out CREATE OR REPLACE FUNCTION saved us.

citext module, these functions:
regexp_match()
regexp_matches()
regexp_replace()
regexp_split_to_array()
regexp_split_to_table()
were created in contrib/citext/citext--1.4.sql, we can add the CREATE
OR REPLACE FUNCTION to 1.4.sql.
but to avoid unintended consequences I just add these to the newly
created file citext--1.6--1.7.sql,
to make a version bump.

Attachment

v6-0001-add-regex-functions-argument-names.patch

Re: add function argument names to regex* functions.

From

Tom Lane

Date:

24 July 2024, 18:51:48

jian he <jian.universality@gmail.com> writes:
> On Fri, Jul 19, 2024 at 5:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The larger issue is that contrib/citext offers versions of some of
>> these functions that are meant to be drop-in replacements using
>> citext input.  Hence, we need to add the same parameter names to
>> those functions, or they'll fail to replace some calls.

> citext module, these functions:
> regexp_match()
> regexp_matches()
> regexp_replace()
> regexp_split_to_array()
> regexp_split_to_table()
> were created in contrib/citext/citext--1.4.sql, we can add the CREATE
> OR REPLACE FUNCTION to 1.4.sql.
> but to avoid unintended consequences I just add these to the newly
> created file citext--1.6--1.7.sql,
> to make a version bump.

Yes.  You *have to* do it like that, the shortcut is not an option,
because without an extension update script there is no way to
upgrade an existing installation to the new definition.  Basically,
once we ship a given release of an extension, that script is frozen
in amber.

I haven't heard any further bikeshedding on the argument names,
so I'll move forward with committing this soon.

            regards, tom lane

Re: add function argument names to regex* functions.

From

Peter Eisentraut

Date:

25 July 2024, 13:42:29

On 15.07.24 16:52, Chapman Flack wrote:
> On 07/15/24 10:46, Chapman Flack wrote:
>> Ah, I may have mistaken which functions the patch meant to apply to.
>> ...
>> Any choice to use similar argument names in the regexp_* functions would
>> be a matter of consistency with the analogous ISO functions, not anything
>> mandated.
> 
> Or, looking back, I might have realized these were the non-ISO regexp_*
> functions, but seen there was bikeshedding happening over the best name
> to use for the occurrence argument, and merely suggested ISO's choice
> OCCURRENCE for the analogous ISO functions, as a possible bikeshed
> accelerator.

These functions were copied from Oracle, so one argument was to use the 
names from Oracle as-is.

Re: add function argument names to regex* functions.

From

Tom Lane

Date:

25 July 2024, 18:53:47

I wrote:
> I haven't heard any further bikeshedding on the argument names,
> so I'll move forward with committing this soon.

Pushed, after a little further fooling with the documentation.

            regards, tom lane

Re: add function argument names to regex* functions.

From

jian he

Date:

26 July 2024, 13:45:58

On Fri, Jul 19, 2024 at 5:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> The small issue is that table 9.10 offers this syntax diagram
> for regexp_replace:
>
> regexp_replace ( string text, pattern text, replacement text [, start integer ] [, flags text ] ) → text
>
> This implies that it's valid to write
>
>         regexp_replace (string, pattern, replacement, start, flags)
>
> but it is not: we have no function matching that signature.  I'm not
> in a hurry to add one, either, for fear of ambiguity against the other
> regexp_replace signature.  I think this needs to be broken into two
> syntax diagrams:
>
> regexp_replace ( string text, pattern text, replacement text [, start integer ] ) → text
> regexp_replace ( string text, pattern text, replacement text [, flags text ] ) → text
>

this problem is still there, after commit
580f8727ca93b7b9a2ce49746b9cdbcb0a2b4a7e.

<<
It has the syntax regexp_replace(string, pattern, replacement [, start
[, N ]] [, flags ]). (Notice that N cannot be specified unless start
is, but flags can be given in any case.)
<<
doc, the above part still needs change?

see my posts:
https://postgr.es/m/CACJufxE5p4KhGyBUwCZCxhxdU%2BzJBXy2deX4u85SL%2Bkew4F7Cw%40mail.gmail.com

Re: add function argument names to regex* functions.

From

Tom Lane

Date:

26 July 2024, 14:17:31

jian he <jian.universality@gmail.com> writes:
> On Fri, Jul 19, 2024 at 5:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> but it is not: we have no function matching that signature.  I'm not
>> in a hurry to add one, either, for fear of ambiguity against the other
>> regexp_replace signature.  I think this needs to be broken into two
>> syntax diagrams:

> this problem is still there, after commit
> 580f8727ca93b7b9a2ce49746b9cdbcb0a2b4a7e.

No, I believe I fixed it: the table now offers

regexp_replace ( string text, pattern text, replacement text [, flags text ] ) → text

regexp_replace ( string text, pattern text, replacement text, start integer [, N integer [, flags text ] ] ) → text

That's different from either of the solutions discussed in this
thread, but simpler.

> <<
> It has the syntax regexp_replace(string, pattern, replacement [, start
> [, N ]] [, flags ]). (Notice that N cannot be specified unless start
> is, but flags can be given in any case.)
> <<
> doc, the above part still needs change?

AFAICS, that one is correct, so I left it alone.  (I didn't try to
merge the table's two entries into one like that, though.)

            regards, tom lane

Re: add function argument names to regex* functions.

From

jian he

Date:

26 July 2024, 14:30:55

On Fri, Jul 26, 2024 at 10:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > <<
> > It has the syntax regexp_replace(string, pattern, replacement [, start
> > [, N ]] [, flags ]). (Notice that N cannot be specified unless start
> > is, but flags can be given in any case.)
> > <<
> > doc, the above part still needs change?
>
> AFAICS, that one is correct, so I left it alone.  (I didn't try to
> merge the table's two entries into one like that, though.)
>

functions-string.html output is correct.

but in functions-matching.html

regexp_replace(string, pattern, replacement [, start [, N ]] [, flags ]).

can represent

regexp_replace(string, pattern, replacement , start,  flags ) ?

but we don't have "regexp_replace(string, pattern, replacement ,
start,  flags )"

Re: add function argument names to regex* functions.

From

Tom Lane

Date:

26 July 2024, 14:39:58

jian he <jian.universality@gmail.com> writes:
> On Fri, Jul 26, 2024 at 10:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> AFAICS, that one is correct, so I left it alone.  (I didn't try to
>> merge the table's two entries into one like that, though.)

> regexp_replace(string, pattern, replacement [, start [, N ]] [, flags ]).

> can represent

> regexp_replace(string, pattern, replacement , start,  flags ) ?

Hmm, yeah, you're right.  I didn't want to write two separate
synopses there, but maybe there's no choice.

            regards, tom lane

Re: add function argument names to regex* functions.

From

jian he

Date:

27 July 2024, 00:56:00

On Fri, Jul 26, 2024 at 10:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> jian he <jian.universality@gmail.com> writes:
> > On Fri, Jul 26, 2024 at 10:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> AFAICS, that one is correct, so I left it alone.  (I didn't try to
> >> merge the table's two entries into one like that, though.)
>
> > regexp_replace(string, pattern, replacement [, start [, N ]] [, flags ]).
>
> > can represent
>
> > regexp_replace(string, pattern, replacement , start,  flags ) ?
>
> Hmm, yeah, you're right.  I didn't want to write two separate
> synopses there, but maybe there's no choice.
>

we can get rid of:
     (Notice that <replaceable>N</replaceable> cannot be specified
     unless <replaceable>start</replaceable> is,
     but <replaceable>flags</replaceable> can be given in any case.)

Now the output is
It has the syntax regexp_replace(string, pattern, replacement [, flags
]) and regexp_replace(string, pattern, replacement, start [, N [,
flags ]]).


I also decorated "[]" with "<optional>".

Attachment

v1-0001-refactor-regexp_replace-syntax.patch

Re: add function argument names to regex* functions.

From

Tom Lane

Date:

27 July 2024, 19:40:09

jian he <jian.universality@gmail.com> writes:
> On Fri, Jul 26, 2024 at 10:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmm, yeah, you're right.  I didn't want to write two separate
>> synopses there, but maybe there's no choice.

> Now the output is
> It has the syntax regexp_replace(string, pattern, replacement [, flags
> ]) and regexp_replace(string, pattern, replacement, start [, N [,
> flags ]]).

Pushed, thanks.

            regards, tom lane