Thread: add function argument names to regex* functions.
Hi.
similar to [1], add function argument names to the following functions:
regexp_like, regexp_match,regexp_matches,regexp_replace,
regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count
so I call these function in a different notation[2], like:
SELECT regexp_like(string=>'a'||CHR(10)||'d', pattern=>'a.d', flags:='n');
select regexp_match(string=>'abc',n pattern=>'(B)(c)', flags=>'i');
select regexp_matches(string=>'Programmer', pattern=>'(\w)(.*?\1)',
flags=>'ig');
SELECT regexp_replace(source=>'A PostgreSQL function',
pattern=>'a|e|i|o|u', replacement=>'X', start=>1, n=>4, flags=>'i');
SELECT regexp_substr(string=>'1234567890',
pattern=>'(123)(4(56)(78))', start=>1, n=>1, flags=>'i', subexpr=>4);
SELECT regexp_split_to_array(string=>'thE QUick bROWn FOx jUMPs ovEr
The lazy dOG', pattern=>'e', flags=>'i');
SELECT foo, length(foo)
FROM regexp_split_to_table(string=>'thE QUick bROWn FOx jUMPs ovEr The
lazy dOG', pattern=>'e',flags=>'i') AS foo;
SELECT regexp_count(string=>'ABCABCABCABC', pattern=>'Abc', start=>1,
flags=>'i');
In [3], except the above mentioned function, there is a "substring" function.
I want to refactor substring function argument names. it looks like:
Schema | Name | Result data type | Argument data
types | Type
------------+-----------+------------------+--------------------------------------------+------
pg_catalog | substring | bit | bits bit, "from" integer
| func
pg_catalog | substring | bit | bits bit, "from" integer,
"for" integer | func
pg_catalog | substring | bytea | bytes bytea, "from"
integer | func
pg_catalog | substring | bytea | bytes bytea, "from"
integer, "for" integer | func
pg_catalog | substring | text | string text, "from"
integer | func
pg_catalog | substring | text | string text, "from"
integer, "for" integer | func
pg_catalog | substring | text | string text, pattern text
| func
pg_catalog | substring | text | text, text, text
| func
(8 rows)
As you can see, the substring function argument names need an explicit
double quote,
which doesn't look good, so I gave up.
[1]https://www.postgresql.org/message-id/flat/877cw3jl8y.fsf@wibble.ilmari.org
[2]https://www.postgresql.org/docs/current/sql-syntax-calling-funcs.html#SQL-SYNTAX-CALLING-FUNCS
[3] https://www.postgresql.org/docs/current/functions-matching.html
similar to [1], add function argument names to the following functions:
regexp_like, regexp_match,regexp_matches,regexp_replace,
regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count
so I call these function in a different notation[2], like:
SELECT regexp_like(string=>'a'||CHR(10)||'d', pattern=>'a.d', flags:='n');
select regexp_match(string=>'abc',n pattern=>'(B)(c)', flags=>'i');
select regexp_matches(string=>'Programmer', pattern=>'(\w)(.*?\1)',
flags=>'ig');
SELECT regexp_replace(source=>'A PostgreSQL function',
pattern=>'a|e|i|o|u', replacement=>'X', start=>1, n=>4, flags=>'i');
SELECT regexp_substr(string=>'1234567890',
pattern=>'(123)(4(56)(78))', start=>1, n=>1, flags=>'i', subexpr=>4);
SELECT regexp_split_to_array(string=>'thE QUick bROWn FOx jUMPs ovEr
The lazy dOG', pattern=>'e', flags=>'i');
SELECT foo, length(foo)
FROM regexp_split_to_table(string=>'thE QUick bROWn FOx jUMPs ovEr The
lazy dOG', pattern=>'e',flags=>'i') AS foo;
SELECT regexp_count(string=>'ABCABCABCABC', pattern=>'Abc', start=>1,
flags=>'i');
In [3], except the above mentioned function, there is a "substring" function.
I want to refactor substring function argument names. it looks like:
Schema | Name | Result data type | Argument data
types | Type
------------+-----------+------------------+--------------------------------------------+------
pg_catalog | substring | bit | bits bit, "from" integer
| func
pg_catalog | substring | bit | bits bit, "from" integer,
"for" integer | func
pg_catalog | substring | bytea | bytes bytea, "from"
integer | func
pg_catalog | substring | bytea | bytes bytea, "from"
integer, "for" integer | func
pg_catalog | substring | text | string text, "from"
integer | func
pg_catalog | substring | text | string text, "from"
integer, "for" integer | func
pg_catalog | substring | text | string text, pattern text
| func
pg_catalog | substring | text | text, text, text
| func
(8 rows)
As you can see, the substring function argument names need an explicit
double quote,
which doesn't look good, so I gave up.
[1]https://www.postgresql.org/message-id/flat/877cw3jl8y.fsf@wibble.ilmari.org
[2]https://www.postgresql.org/docs/current/sql-syntax-calling-funcs.html#SQL-SYNTAX-CALLING-FUNCS
[3] https://www.postgresql.org/docs/current/functions-matching.html
Attachment
On 27.12.23 17:53, jian he wrote: > similar to [1], add function argument names to the following functions: > regexp_like, regexp_match,regexp_matches,regexp_replace, > regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count Note that these functions are a quasi-standard that is shared with other SQL implementations. It might be worth looking around if there are already other implementations of this idea.
On Thu, Dec 28, 2023 at 6:25 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 27.12.23 17:53, jian he wrote: > > similar to [1], add function argument names to the following functions: > > regexp_like, regexp_match,regexp_matches,regexp_replace, > > regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count > > Note that these functions are a quasi-standard that is shared with other > SQL implementations. It might be worth looking around if there are > already other implementations of this idea. > turns out people do like calling functions via explicitly mentioning function argument names, example: [0] There are no provisions for the argument names. I looked around the oracle implementation in [1], and the oracle regex related function argumentation name in [2] I use the doc [3] syntax explanation and add the related function names. Current, regex.* function syntax seems fine. but only parameter `N` seems a little bit weird. If we change the function's argument name, we also need to change function syntax explanation in the doc; vise versa. QUOTE: The regexp_instr function returns the starting or ending position of the N'th match of a POSIX regular expression pattern to a string, or zero if there is no such match. It has the syntax regexp_instr(string, pattern [, start [, N [, endoption [, flags [, subexpr ]]]]]). pattern is searched for in string, normally from the beginning of the string, but if the start parameter is provided then beginning from that character index. If N is specified then the N'th match of the pattern is located, otherwise the first match is located. END OF QUOTE. maybe we can change `N` to occurrence. but `occurrence` is kind of verbose. [0] https://stackoverflow.com/questions/33387348/oracle-named-parameters-in-regular-functions [1] https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099 [2] https://dbfiddle.uk/h_SBDEKi [3] https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
On Wed Dec 27, 2023 at 10:28 PM EST, jian he wrote: > On Thu, Dec 28, 2023 at 6:25 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > > > On 27.12.23 17:53, jian he wrote: > > > similar to [1], add function argument names to the following functions: > > > regexp_like, regexp_match,regexp_matches,regexp_replace, > > > regexp_substr,regexp_split_to_array,regexp_split_to_table,regexp_count > > > > Note that these functions are a quasi-standard that is shared with other > > SQL implementations. It might be worth looking around if there are > > already other implementations of this idea. > > > > turns out people do like calling functions via explicitly mentioning > function argument names, example: [0] > There are no provisions for the argument names. > > I looked around the oracle implementation in [1], and the oracle regex > related function argumentation name in [2] > I use the doc [3] syntax explanation and add the related function names. > > Current, regex.* function syntax seems fine. but only parameter `N` > seems a little bit weird. > If we change the function's argument name, we also need to change > function syntax explanation in the doc; vise versa. > > QUOTE: > The regexp_instr function returns the starting or ending position of > the N'th match of a POSIX regular expression pattern to a string, or > zero if there is no such match. It has the syntax regexp_instr(string, > pattern [, start [, N [, endoption [, flags [, subexpr ]]]]]). pattern > is searched for in string, normally from the beginning of the string, > but if the start parameter is provided then beginning from that > character index. If N is specified then the N'th match of the pattern > is located, otherwise the first match is located. > END OF QUOTE. > > maybe we can change `N` to occurrence. but `occurrence` is kind of verbose. > > [0] https://stackoverflow.com/questions/33387348/oracle-named-parameters-in-regular-functions > [1] https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099 > [2] https://dbfiddle.uk/h_SBDEKi > [3] https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP I've been trying to use named arguments more diligently so expanding support for built-in functions is welcome. The patch applies cleanly and works as advertised. I agree that the parameter name `n` is not ideal. For example, in `regexp_replace` it's easy to misinterpret it as "make up to n replacements". This has not been a problem when `n` only lives in the documentation which explains exactly what it does, but that context is not readily available in code expressing `n => 3`. Another possibility is `index`, which is relatively short and not a reserved keyword ^1. `position` is not as precise but would avoid the conceptual overloading of ordinary indices. 1. https://www.postgresql.org/docs/current/sql-keywords-appendix.html
On 28.12.23 04:28, jian he wrote: > I looked around the oracle implementation in [1], and the oracle regex > related function argumentation name in [2] > I use the doc [3] syntax explanation and add the related function names. > > Current, regex.* function syntax seems fine. but only parameter `N` > seems a little bit weird. > If we change the function's argument name, we also need to change > function syntax explanation in the doc; vise versa. So, it looks like Oracle already has defined parameter names for these, so we should make ours match.
> > Another possibility is `index`, which is relatively short and not a > > reserved keyword ^1. `position` is not as precise but would avoid the > > conceptual overloading of ordinary indices. > > I'm not a fan of "index" since that leaves the question of > whether it's 0 or 1 based. "Position" is a bit better, but I think > Jian's suggestion of "occurance" is best. We do have precedent for one-based `index` in Postgres: array types are 1-indexed by default! "Occurrence" removes that ambiguity but it's long and easy to misspell (I looked it up after typing it just now and it _still_ feels off). How's "instance"?
On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim.nasby@gmail.com> wrote: > > On 1/3/24 5:05 PM, Dian Fay wrote: > > Another possibility is `index`, which is relatively short and not a > reserved keyword ^1. `position` is not as precise but would avoid the > conceptual overloading of ordinary indices. > > I'm not a fan of "index" since that leaves the question of > whether it's 0 or 1 based. "Position" is a bit better, but I think > Jian's suggestion of "occurance" is best. > > We do have precedent for one-based `index` in Postgres: array types are > 1-indexed by default! "Occurrence" removes that ambiguity but it's long > and easy to misspell (I looked it up after typing it just now and it > _still_ feels off). > > How's "instance"? > > Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so presumablymisspelling wouldn't be a big issue. But I think "instance" is OK as well. > > -- > Jim Nasby, Data Architect, Austin TX regexp_instr: It has the syntax regexp_instr(string, pattern [, start [, N [, endoption [, flags [, subexpr ]]]]]) oracle: REGEXP_INSTR (source_char, pattern, [, position [, occurrence [, return_opt [, match_param [, subexpr ]]]]] ) "string" and "source_char" are almost the same descriptive, so maybe there is no need to change. "start" is better than "position", imho. "return_opt" is better than "endoption", (maybe we need change, for now I didn't) "flags" cannot be changed to "match_param", given it quite everywhere in functions-matching.html. similarly for function regexp_replace, oracle using "repplace_string", we use "replacement"(mentioned in the doc). so I don't think we need to change to "repplace_string". Based on how people google[0], I think `occurrence` is ok, even though it's verbose. to change from `N` to `occurrence`, we also need to change the doc, that is why this patch is more larger. [0]: https://www.google.com/search?q=regex+nth+match&oq=regex+nth+match&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg8MgYIAhBFGDzSAQc2MThqMGo5qAIAsAIA&sourceid=chrome&ie=UTF-8
Attachment
On Thu Jan 4, 2024 at 2:03 AM EST, jian he wrote: > On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim.nasby@gmail.com> wrote: > > > > On 1/3/24 5:05 PM, Dian Fay wrote: > > > > Another possibility is `index`, which is relatively short and not a > > reserved keyword ^1. `position` is not as precise but would avoid the > > conceptual overloading of ordinary indices. > > > > I'm not a fan of "index" since that leaves the question of > > whether it's 0 or 1 based. "Position" is a bit better, but I think > > Jian's suggestion of "occurance" is best. > > > > We do have precedent for one-based `index` in Postgres: array types are > > 1-indexed by default! "Occurrence" removes that ambiguity but it's long > > and easy to misspell (I looked it up after typing it just now and it > > _still_ feels off). > > > > How's "instance"? > > > > Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so presumablymisspelling wouldn't be a big issue. But I think "instance" is OK as well. > > > > -- > > Jim Nasby, Data Architect, Austin TX > > regexp_instr: It has the syntax regexp_instr(string, pattern [, start > [, N [, endoption [, flags [, subexpr ]]]]]) > oracle: > REGEXP_INSTR (source_char, pattern, [, position [, occurrence [, > return_opt [, match_param [, subexpr ]]]]] ) > > "string" and "source_char" are almost the same descriptive, so maybe > there is no need to change. > "start" is better than "position", imho. > "return_opt" is better than "endoption", (maybe we need change, for > now I didn't) > "flags" cannot be changed to "match_param", given it quite everywhere > in functions-matching.html. > > similarly for function regexp_replace, oracle using "repplace_string", > we use "replacement"(mentioned in the doc). > so I don't think we need to change to "repplace_string". > > Based on how people google[0], I think `occurrence` is ok, even though > it's verbose. > to change from `N` to `occurrence`, we also need to change the doc, > that is why this patch is more larger. > > > [0]: https://www.google.com/search?q=regex+nth+match&oq=regex+nth+match&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg8MgYIAhBFGDzSAQc2MThqMGo5qAIAsAIA&sourceid=chrome&ie=UTF-8 The `regexp_replace` summary in table 9.10 is mismatched and still specifies the first parameter name as `string` instead of `source`. Since all the other functions use `string`, should `regexp_replace` do the same or is this a case where an established "standard" diverges? I noticed the original documentation for some of these functions is rather disorganized; summaries explain `occurrence` without explaining the prior `start` parameter, and detailed documentation in 9.7 is usually a single paragraph per function running pell-mell through ifs and buts without section headings, so entries in table 9.10 have to reference the entire section 9.7.3 instead of their specific functions. It's out of scope here, but should I bring this up on pgsql-docs?
On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote: > > On Thu Jan 4, 2024 at 2:03 AM EST, jian he wrote: > > On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim.nasby@gmail.com> wrote: > > > > > > On 1/3/24 5:05 PM, Dian Fay wrote: > > > > > > Another possibility is `index`, which is relatively short and not a > > > reserved keyword ^1. `position` is not as precise but would avoid the > > > conceptual overloading of ordinary indices. > > > > > > I'm not a fan of "index" since that leaves the question of > > > whether it's 0 or 1 based. "Position" is a bit better, but I think > > > Jian's suggestion of "occurance" is best. > > > > > > We do have precedent for one-based `index` in Postgres: array types are > > > 1-indexed by default! "Occurrence" removes that ambiguity but it's long > > > and easy to misspell (I looked it up after typing it just now and it > > > _still_ feels off). > > > > > > How's "instance"? > > > > > > Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so presumablymisspelling wouldn't be a big issue. But I think "instance" is OK as well. > > > > > > -- > > > Jim Nasby, Data Architect, Austin TX > > > > regexp_instr: It has the syntax regexp_instr(string, pattern [, start > > [, N [, endoption [, flags [, subexpr ]]]]]) > > oracle: > > REGEXP_INSTR (source_char, pattern, [, position [, occurrence [, > > return_opt [, match_param [, subexpr ]]]]] ) > > > > "string" and "source_char" are almost the same descriptive, so maybe > > there is no need to change. > > "start" is better than "position", imho. > > "return_opt" is better than "endoption", (maybe we need change, for > > now I didn't) > > "flags" cannot be changed to "match_param", given it quite everywhere > > in functions-matching.html. > > > > similarly for function regexp_replace, oracle using "repplace_string", > > we use "replacement"(mentioned in the doc). > > so I don't think we need to change to "repplace_string". > > > > Based on how people google[0], I think `occurrence` is ok, even though > > it's verbose. > > to change from `N` to `occurrence`, we also need to change the doc, > > that is why this patch is more larger. > > > > > > [0]: https://www.google.com/search?q=regex+nth+match&oq=regex+nth+match&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg8MgYIAhBFGDzSAQc2MThqMGo5qAIAsAIA&sourceid=chrome&ie=UTF-8 > > The `regexp_replace` summary in table 9.10 is mismatched and still > specifies the first parameter name as `string` instead of `source`. > Since all the other functions use `string`, should `regexp_replace` do > the same or is this a case where an established "standard" diverges? > got it. Thanks for pointing it out. in functions-matching.html if I change <replaceable>source</replaceable> to <replaceable>string</replaceable> then there are no markup "string" and markup "string", it's kind of slightly confusing. So does the following refactored description of regexp_replace make sense: The <replaceable>string</replaceable> is returned unchanged if there is no match to the <replaceable>pattern</replaceable>. If there is a match, the <replaceable>string</replaceable> is returned with the <replaceable>replacement</replaceable> string substituted for the matching substring. The <replaceable>replacement</replaceable> string can contain <literal>\</literal><replaceable>n</replaceable>, where <replaceable>n</replaceable> is 1 through 9, to indicate that the source substring matching the <replaceable>n</replaceable>'th parenthesized subexpression of the pattern should be inserted, and it can contain <literal>\&</literal> to indicate that the substring matching the entire pattern should be inserted. Write <literal>\\</literal> if you need to put a literal backslash in the replacement text. > I noticed the original documentation for some of these functions is > rather disorganized; summaries explain `occurrence` without explaining > the prior `start` parameter, and detailed documentation in 9.7 is > usually a single paragraph per function running pell-mell through ifs > and buts without section headings, so entries in table 9.10 have to > reference the entire section 9.7.3 instead of their specific functions. > It's out of scope here, but should I bring this up on pgsql-docs? I got it. in Table 9.10. Other String Functions and Operators, if we can reference the specific function would be great. As for now, in the browser, you need to use Ctrl+F to find the detailed explanation in 9.7.3. you can just bring your suggested or patch to pgsql-hackers@postgresql.org.
On Mon Jan 8, 2024 at 9:26 AM EST, jian he wrote: > On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote: > > The `regexp_replace` summary in table 9.10 is mismatched and still > > specifies the first parameter name as `string` instead of `source`. > > Since all the other functions use `string`, should `regexp_replace` do > > the same or is this a case where an established "standard" diverges? > > got it. Thanks for pointing it out. > > in functions-matching.html > if I change <replaceable>source</replaceable> to > <replaceable>string</replaceable> then > there are no markup "string" and markup "string", it's kind of > slightly confusing. > > So does the following refactored description of regexp_replace make sense: > > The <replaceable>string</replaceable> is returned unchanged if > there is no match to the <replaceable>pattern</replaceable>. If there is a > match, the <replaceable>string</replaceable> is returned with the > <replaceable>replacement</replaceable> string substituted for the matching > substring. The <replaceable>replacement</replaceable> string can contain > <literal>\</literal><replaceable>n</replaceable>, where > <replaceable>n</replaceable> is 1 > through 9, to indicate that the source substring matching the > <replaceable>n</replaceable>'th parenthesized subexpression of > the pattern should be > inserted, and it can contain <literal>\&</literal> to indicate that the > substring matching the entire pattern should be inserted. Write > <literal>\\</literal> if you need to put a literal backslash in > the replacement > text. That change makes sense to me! I'll see about the section refactoring after this lands.
On Tue, Jan 9, 2024 at 8:52 AM Dian Fay <di@nmfay.com> wrote: > > On Mon Jan 8, 2024 at 9:26 AM EST, jian he wrote: > > On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote: > > > The `regexp_replace` summary in table 9.10 is mismatched and still > > > specifies the first parameter name as `string` instead of `source`. > > > Since all the other functions use `string`, should `regexp_replace` do > > > the same or is this a case where an established "standard" diverges? > > > > got it. Thanks for pointing it out. > > > > in functions-matching.html > > if I change <replaceable>source</replaceable> to > > <replaceable>string</replaceable> then > > there are no markup "string" and markup "string", it's kind of > > slightly confusing. > > > > So does the following refactored description of regexp_replace make sense: > > > > The <replaceable>string</replaceable> is returned unchanged if > > there is no match to the <replaceable>pattern</replaceable>. If there is a > > match, the <replaceable>string</replaceable> is returned with the > > <replaceable>replacement</replaceable> string substituted for the matching > > substring. The <replaceable>replacement</replaceable> string can contain > > <literal>\</literal><replaceable>n</replaceable>, where > > <replaceable>n</replaceable> is 1 > > through 9, to indicate that the source substring matching the > > <replaceable>n</replaceable>'th parenthesized subexpression of > > the pattern should be > > inserted, and it can contain <literal>\&</literal> to indicate that the > > substring matching the entire pattern should be inserted. Write > > <literal>\\</literal> if you need to put a literal backslash in > > the replacement > > text. > > That change makes sense to me! I'll see about the section refactoring > after this lands. I put the changes into the new patch.
Attachment
On Wed Jan 10, 2024 at 9:18 AM EST, jian he wrote: > On Tue, Jan 9, 2024 at 8:52 AM Dian Fay <di@nmfay.com> wrote: > > > > On Mon Jan 8, 2024 at 9:26 AM EST, jian he wrote: > > > On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <di@nmfay.com> wrote: > > > > The `regexp_replace` summary in table 9.10 is mismatched and still > > > > specifies the first parameter name as `string` instead of `source`. > > > > Since all the other functions use `string`, should `regexp_replace` do > > > > the same or is this a case where an established "standard" diverges? > > > > > > got it. Thanks for pointing it out. > > > > > > in functions-matching.html > > > if I change <replaceable>source</replaceable> to > > > <replaceable>string</replaceable> then > > > there are no markup "string" and markup "string", it's kind of > > > slightly confusing. > > > > > > So does the following refactored description of regexp_replace make sense: > > > > > > The <replaceable>string</replaceable> is returned unchanged if > > > there is no match to the <replaceable>pattern</replaceable>. If there is a > > > match, the <replaceable>string</replaceable> is returned with the > > > <replaceable>replacement</replaceable> string substituted for the matching > > > substring. The <replaceable>replacement</replaceable> string can contain > > > <literal>\</literal><replaceable>n</replaceable>, where > > > <replaceable>n</replaceable> is 1 > > > through 9, to indicate that the source substring matching the > > > <replaceable>n</replaceable>'th parenthesized subexpression of > > > the pattern should be > > > inserted, and it can contain <literal>\&</literal> to indicate that the > > > substring matching the entire pattern should be inserted. Write > > > <literal>\\</literal> if you need to put a literal backslash in > > > the replacement > > > text. > > > > That change makes sense to me! I'll see about the section refactoring > > after this lands. > > I put the changes into the new patch. Sorry, I missed one minor issue with v2. The replacement on lines 6027-6028 of func.sgml (originally "`n` rows if there are `n` matches") is not needed and could be more confusing since the `n` represents a number, not an argument to `regexp_matches`. I've built v3 and gone over everything else one more time and it looks good.
On 10.01.24 15:18, jian he wrote: > I put the changes into the new patch. Reading back through the discussion, I wasn't quite able to interpret the resolution regarding Oracle compatibility. From the patch, it looks like you chose not to adopt the parameter names from Oracle. Was that your intention?
On Thu, Jan 18, 2024 at 4:17 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 10.01.24 15:18, jian he wrote: > > I put the changes into the new patch. > > Reading back through the discussion, I wasn't quite able to interpret > the resolution regarding Oracle compatibility. From the patch, it looks > like you chose not to adopt the parameter names from Oracle. Was that > your intention? > per committee message: https://git.postgresql.org/cgit/postgresql.git/commit/?id=6424337073589476303b10f6d7cc74f501b8d9d7 Even if the names are all the same, our function is still not the same as oracle. There is a documentation bug. In [0], Table 9.25. Regular Expression Functions Equivalencies regexp_replace function definition: regexp_replace(string, pattern, replacement) In one of the <tip> section below, regexp_replace explains as <<<<< The regexp_replace function provides substitution of new text for substrings that match POSIX regular expression patterns. It has the syntax regexp_replace(source, pattern, replacement [, start [, N ]] [, flags ]). (Notice that N cannot be specified unless start is, but flags can be given in any case.) <<<<< So I changed the first argument of regexp_replace to "string". So accordingly, the doc needs to change also, which I did. another regex* function argument changes: from "N" to "occurences", example: + If <replaceable>occurrence</replaceable> is specified + then the <replaceable>occurrence</replaceable>'th match of the pattern + is located, but [2] says Speaking of the "occurrence'th occurrence" is just silly, not to mention long and easy to misspell." summary: adding function-named notation is my intention. To make regex.* functions named-notation works, we need to add proargnames to src/include/catalog/pg_proc.dat. add proargnames also require changing the doc. naming proargnames is a matter of taste now, So I only change 'N' to 'occurrence'. [0] https://www.postgresql.org/docs/current/functions-matching.html [1] https://www.postgresql.org/message-id/flat/fc160ee0-c843-b024-29bb-97b5da61971f%40darold.net
On Sat, Jan 20, 2024 at 10:55 AM jian he <jian.universality@gmail.com> wrote: > > > another regex* function argument changes: from "N" to "occurences", example: > + If <replaceable>occurrence</replaceable> is specified > + then the <replaceable>occurrence</replaceable>'th match of the pattern > + is located, > > but [2] says > Speaking of the "occurrence'th > occurrence" is just silly, not to mention long and easy to misspell." > sorry. [2], The reference link is https://www.postgresql.org/message-id/1567465.1627860115%40sss.pgh.pa.us my previous post will link to the whole thread.
jian he <jian.universality@gmail.com> writes: > On Thu, Jan 18, 2024 at 4:17 PM Peter Eisentraut <peter@eisentraut.org> wrote: >> Reading back through the discussion, I wasn't quite able to interpret >> the resolution regarding Oracle compatibility. From the patch, it looks >> like you chose not to adopt the parameter names from Oracle. Was that >> your intention? > per committee message: > https://git.postgresql.org/cgit/postgresql.git/commit/?id=6424337073589476303b10f6d7cc74f501b8d9d7 > Even if the names are all the same, our function is still not the same > as oracle. The fact that there's minor discrepancies in the regex languages doesn't seem to me to have a lot of bearing on whether we should follow Oracle's choices of parameter names. However, if we do follow Oracle, it seems like we should do that consistently, which this patch doesn't. For instance, per [1] Oracle calls the arguments of regex_substr source_char, pattern, position, occurrence, match_param, subexpr while we have string, pattern, start, N, flags, subexpr The patch proposes to replace "N" with "occurrence" but not touch the other discrepancies, which seems to me to be a pretty poor choice. "occurrence" is very long and difficult to spell correctly, and if you're not following Oracle slavishly, exactly what is the argument in its favor? I quite agree that Oracle's other choices aren't improvements over ours, but neither is that one. On the whole my inclination would be to stick to the names we have in the documentation. There might be an argument for changing "N" to something lower-case so you don't have to quote it; but if we do, I'd go for, say, "count". regards, tom lane [1] https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099
On Wed, Apr 3, 2024 at 4:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > jian he <jian.universality@gmail.com> writes: > > On Thu, Jan 18, 2024 at 4:17 PM Peter Eisentraut <peter@eisentraut.org> wrote: > >> Reading back through the discussion, I wasn't quite able to interpret > >> the resolution regarding Oracle compatibility. From the patch, it looks > >> like you chose not to adopt the parameter names from Oracle. Was that > >> your intention? > > > per committee message: > > https://git.postgresql.org/cgit/postgresql.git/commit/?id=6424337073589476303b10f6d7cc74f501b8d9d7 > > Even if the names are all the same, our function is still not the same > > as oracle. > > The fact that there's minor discrepancies in the regex languages > doesn't seem to me to have a lot of bearing on whether we should > follow Oracle's choices of parameter names. > > However, if we do follow Oracle, it seems like we should do that > consistently, which this patch doesn't. For instance, per [1] > Oracle calls the arguments of regex_substr > > source_char, > pattern, > position, > occurrence, > match_param, > subexpr > > while we have > > string, > pattern, > start, > N, > flags, > subexpr > > The patch proposes to replace "N" with "occurrence" but not touch > the other discrepancies, which seems to me to be a pretty poor > choice. "occurrence" is very long and difficult to spell correctly, > and if you're not following Oracle slavishly, exactly what is the > argument in its favor? I quite agree that Oracle's other choices > aren't improvements over ours, but neither is that one. > > On the whole my inclination would be to stick to the names we have > in the documentation. There might be an argument for changing "N" > to something lower-case so you don't have to quote it; but if we do, > I'd go for, say, "count". > we have --------------------------------------------------------------- The replacement string can contain \n, where n is 1 through 9, to indicate that the source substring matching the n'th parenthesized subexpression of the pattern should be inserted, and it can contain \& to indicate that the substring matching the entire pattern should be inserted. ---------------------------------------------------------------------------- in the regexp_replace explanation section. changing "N" to lower-case would be misleading for regexp_replace? so I choose "count". By the way, I think the above is so hard to comprehend. I can only find related test in src/test/regress/sql/strings.sql are: SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})', E'(\\1) \\2-\\3'); SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\&Y', 'g'); SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\\\Y', 'g'); but these tests seem not friendly. maybe we should have some simple examples to demonstrate the above paragraph.
Attachment
On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com> wrote: > in the regexp_replace explanation section. > changing "N" to lower-case would be misleading for regexp_replace? > so I choose "count". I don't see why that would be confusing for regexp_replace specifically, but I think N => count is a reasonable change to make. However, I don't think this quite works: + then the <replaceable>count</replaceable>'th match of the pattern An English speaker is more likely to understand what is meant by "N'th" than what is meant by "count'th". Even if they can guess, it's kinda strange-looking. I think it needs to be rephrased somehow, but I'm not sure exactly how. > By the way, I think the above is so hard to comprehend. > I can only find related test in src/test/regress/sql/strings.sql are: > SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})', > E'(\\1) \\2-\\3'); > SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\&Y', 'g'); > SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\\\Y', 'g'); > > but these tests seem not friendly. > maybe we should have some simple examples to demonstrate the above paragraph. Examples in the regression tests aren't meant as tests, not examples for users to copy. If we want examples, those belong in the documentation. However, I see that regexp_replace already has some examples at https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP so I'm not sure exactly what you think should be added. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, May 15, 2024 at 2:46 PM Robert Haas <robertmhaas@gmail.com> wrote: > Examples in the regression tests aren't meant as tests, not examples > for users to copy. If we want examples, those belong in the > documentation. However, I see that regexp_replace already has some > examples at https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP > so I'm not sure exactly what you think should be added. Woops. I should have said: Examples in the regression tests *are* meant as tests... -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, May 15, 2024 at 11:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com> wrote:
> in the regexp_replace explanation section.
> changing "N" to lower-case would be misleading for regexp_replace?
> so I choose "count".
I don't see why that would be confusing for regexp_replace
specifically, but I think N => count is a reasonable change to make.
However, I don't think this quite works:
+ then the <replaceable>count</replaceable>'th match of the pattern
An English speaker is more likely to understand what is meant by
"N'th" than what is meant by "count'th". Even if they can guess, it's
kinda strange-looking. I think it needs to be rephrased somehow, but
I'm not sure exactly how.
I think this confusion goes to show that replacing N with count doesn't work.
"replace_at" comes to mind as a better name.
By default, only the first match of the pattern is replaced. If replace_at is specified and greater than zero, then the first "replace_at - 1" matches are skipped before making a single replacement (i.e., the g flag is ignored when replace_at is specified.)
David J.
On Wed, May 15, 2024 at 3:01 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > I think this confusion goes to show that replacing N with count doesn't work. > > "replace_at" comes to mind as a better name. I do not agree with that at all. It shows that a literal search-and-replace changing N to count does not work, but it does not show that count is a bad name for the concept, and I don't think it is. I believe that if I were reading the documentation, count would be clearer to me than N, N would probably still be clear enough, and replace_at wouldn't be clear at all. I'd expect replace_at to be a character position or something, not an occurrence count. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com> wrote: >> changing "N" to lower-case would be misleading for regexp_replace? >> so I choose "count". > I don't see why that would be confusing for regexp_replace > specifically, but I think N => count is a reasonable change to make. > However, I don't think this quite works: > + then the <replaceable>count</replaceable>'th match of the pattern I think the origin of the problem here is not wanting to use "N" as the actual name of the parameter, because then users would have to double-quote it to write "regexp_replace(..., "N" => 42, ...)". However ... is that really so awful? It's still fewer keystrokes than "count". It's certainly a potential gotcha for users who've not internalized when they need double quotes, but I think we could largely address that problem just by making sure to provide a documentation example that shows use of "N". > An English speaker is more likely to understand what is meant by > "N'th" than what is meant by "count'th". +1 ... none of the proposals make that bit read more clearly than it does now. regards, tom lane
On 05/15/24 15:07, Robert Haas wrote: > is. I believe that if I were reading the documentation, count would be > clearer to me than N, N would probably still be clear enough, and > replace_at wouldn't be clear at all. I'd expect replace_at to be a > character position or something, not an occurrence count. You've said the magic word. In the analogous (but XQuery-based) ISO standard regex functions, the argument that does that is identified with the keyword OCCURRENCE. What would be wrong with that, for consistency's sake? Regards, -Chap
On Wed, May 15, 2024 at 12:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, May 15, 2024 at 3:01 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> I think this confusion goes to show that replacing N with count doesn't work.
>
> "replace_at" comes to mind as a better name.
I do not agree with that at all. It shows that a literal
search-and-replace changing N to count does not work, but it does not
show that count is a bad name for the concept, and I don't think it
is. I believe that if I were reading the documentation, count would be
clearer to me than N, N would probably still be clear enough, and
replace_at wouldn't be clear at all. I'd expect replace_at to be a
character position or something, not an occurrence count.
The function replaces matches, not random characters. And if you are reading the documentation I find it implausible that the wording I suggested would cause one to think in terms of characters instead of matches.
If I choose not to read the documentation "count" seems like it behaves as a qualified "g". I don't want all matches replaced, I want the first "count" matches only replaced.
"occurrence" probably is the best choice but I agree the spelling issues are a big negative.
count - how many things there are. This isn't a count. I'd rather stick with N, at least it actually has the desired meaning as a pointer to an item in a list.
N - The label provides zero context as to what the number you place there is going to be used for. Labels ideally do more work than this especially if someone takes the time to spell them out. Otherwise why use "pattern" instead of "p".
David J.
On Wed, May 15, 2024 at 3:23 PM Chapman Flack <jcflack@acm.org> wrote: > What would be wrong with that, for consistency's sake? It was proposed and rejected upthread, but that's not to say that I necessarily endorse the reasons given. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, May 15, 2024 at 3:25 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > The function replaces matches, not random characters. And if you are reading the documentation I find it implausible thatthe wording I suggested would cause one to think in terms of characters instead of matches. I mean I just told you what my reaction to it was. If you find that reaction "implausible" then I guess you think I was lying when I said that? > N - The label provides zero context as to what the number you place there is going to be used for. Labels ideally do morework than this especially if someone takes the time to spell them out. Otherwise why use "pattern" instead of "p". I feel like you're attacking a straw man here. I never said that N was my first choice; in fact, I said the opposite. But I do think that if the documentation says, as it does, that the function is regexp_replace(source, pattern, replacement, start, N, flags), a reader who has some idea what a function called regexp_replace might do will probably be able to guess what N is. It's probably also true that if we changed "pattern" to "p" they would still be able to guess that too, because there's nothing other than a pattern that you'd expect to pass to a regexp-replacement function that starts with p, but it would still be worse than what we have now. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, May 15, 2024 at 12:52 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, May 15, 2024 at 3:25 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> The function replaces matches, not random characters. And if you are reading the documentation I find it implausible that the wording I suggested would cause one to think in terms of characters instead of matches.
I mean I just told you what my reaction to it was. If you find that
reaction "implausible" then I guess you think I was lying when I said
that?
You just broke my brain when you say that you read:
By default, only the first match of the pattern is replaced. If replace_at is specified and greater than zero, then the first "replace_at - 1" matches are skipped before making a single replacement (i.e., the g flag is ignored when replace_at is specified.)
And then say:
I'd expect replace_at to be a character position or something, not an occurrence count.
I guess it isn't a claim you are lying, rather I simply don't follow your mental model of all this and in my mental model behind the proposal I don't believe the typical reader will become confused on that point. I guess that means I don't find you to be the typical reader, at least so far as this specific topic goes. But hey, maybe I'm the one in the minority. In either case we disagree and that was my main point.
David J.
On Wed, May 15, 2024 at 12:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, May 15, 2024 at 3:01 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> I think this confusion goes to show that replacing N with count doesn't work.
>
> "replace_at" comes to mind as a better name.
I'd expect replace_at to be a
character position or something, not an occurrence count.
I'll amend the name to: "replace_match"
I do now see that since the immediately preceding parameter, "start", deals with characters instead of matches that making it clear this parameter deals in matches in the name work. The singular 'match' has all the same benefits as 'at' plus this point of clarity.
David J.
On Wed, May 15, 2024 at 4:13 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > > You just broke my brain when you say that you read: > > By default, only the first match of the pattern is replaced. If replace_at is specified and greater than zero, then thefirst "replace_at - 1" matches are skipped before making a single replacement (i.e., the g flag is ignored when replace_atis specified.) > > And then say: > > I'd expect replace_at to be a character position or something, not an occurrence count. Ah. What I meant was: if I just saw the parameter name, and not the documentation, I believe that I would not correctly understand what it did. I would have had to read the docs. Whereas I'm pretty sure at some point years ago, I looked up these functions and I saw "N", and I did understand what that did without needing it explained. If I had seen "count" or "occurrence" I think I would have understood that without further explanation, too. So my point was: to me, N is more self-documenting than replace_at, and less self-documenting than count or occurrence. If your mileage varies on that point, so be it! -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, May 15, 2024 at 1:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
So my point was: to me, N is more self-documenting than replace_at,
and less self-documenting than count or occurrence.
If your mileage varies on that point, so be it!
Maybe just "match" instead of "replace_match".
Reading this it strikes me that any of these parameter names can and probably should be read as having "replace" in front of them:
replace N
replace count
replace occurrence
replace match
Saying replace becomes redundant:
replace replace at
replace replace match
David J.
On 05/15/24 15:31, Robert Haas wrote: > On Wed, May 15, 2024 at 3:23 PM Chapman Flack <jcflack@acm.org> wrote: >> What would be wrong with [occurrence], for consistency's sake? > > It was proposed and rejected upthread, but that's not to say that I > necessarily endorse the reasons given. Apologies for not having read far enough up the thread before replying. Having done so now, I guess I'd just offer one small point: the upthread discussion did mention that 'occurrence' was used by Oracle, and asked "if you're not following Oracle slavishly, exactly what is the argument in its favor?". Nothing else upthread seems to have mentioned that OCCURRENCE is the exact keyword used in ISO SQL for the analogous argument in analogous functions. Maybe that won't have any effect on the outcome either, but it does seem worth getting into the thread. Regards, -Chap
On Wed, May 15, 2024 at 4:25 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > Maybe just "match" instead of "replace_match". Well, this is just turning into a bikeshedding exercise at this point. We can generate names for this parameter all day long, but a bunch of names none of which gets more than one vote is not really helping anything. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, May 16, 2024 at 3:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > On Thu, Apr 4, 2024 at 9:55 AM jian he <jian.universality@gmail.com> wrote: > >> changing "N" to lower-case would be misleading for regexp_replace? > >> so I choose "count". > > > I don't see why that would be confusing for regexp_replace > > specifically, but I think N => count is a reasonable change to make. > > However, I don't think this quite works: > > + then the <replaceable>count</replaceable>'th match of the pattern > > I think the origin of the problem here is not wanting to use "N" > as the actual name of the parameter, because then users would have > to double-quote it to write "regexp_replace(..., "N" => 42, ...)". > > However ... is that really so awful? It's still fewer keystrokes > than "count". It's certainly a potential gotcha for users who've > not internalized when they need double quotes, but I think we > could largely address that problem just by making sure to provide > a documentation example that shows use of "N". done it this way. patch attached. last example from regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i') A PostgrXSQL function change to regexp_replace(string=>'A PostgreSQL function', pattern=>'a|e|i|o|u', replacement=>'X',start=>1, "N"=>3, flags=>'i'); A PostgrXSQL function but I am not 100% sure <lineannotation>A PostgrXSQL function</lineannotation> is in the right position. also address Chapman Flack point: correct me if i am wrong, but i don't think the ISO standard mandates function argument names. So we can choose the best function argument name for our purpose?
Attachment
On 07/15/24 08:02, jian he wrote: > also address Chapman Flack point: > correct me if i am wrong, but i don't think the ISO standard mandates > function argument names. > So we can choose the best function argument name for our purpose? Ah, I may have mistaken which functions the patch meant to apply to. These being the non-ISO regexp_* functions using POSIX expressions, the ISO standard indeed says nothing about them. In the ISO standard *_regex "functions", there are not really "function argument names" mandated, because, like so many things in ISO SQL, they have their own special syntax instead of being generic function calls: TRANSLATE_REGEX('a|e|i|o|u' FLAG 'i' IN 'A PostgreSQL function' WITH 'X' FROM 1 OCCURRENCE 3); Any choice to use similar argument names in the regexp_* functions would be a matter of consistency with the analogous ISO functions, not anything mandated. Regards, -Chap
On 07/15/24 10:46, Chapman Flack wrote: > Ah, I may have mistaken which functions the patch meant to apply to. > ... > Any choice to use similar argument names in the regexp_* functions would > be a matter of consistency with the analogous ISO functions, not anything > mandated. Or, looking back, I might have realized these were the non-ISO regexp_* functions, but seen there was bikeshedding happening over the best name to use for the occurrence argument, and merely suggested ISO's choice OCCURRENCE for the analogous ISO functions, as a possible bikeshed accelerator. Regards, -Chap
jian he <jian.universality@gmail.com> writes: > [ v5-0001-add-regex-functions-argument-names-to-pg_proc.patch ] I'm not sure whether we've bikeshedded this to death yet, but personally I'm content with the naming choices here (which basically are those already shown in table 9.10). However, while looking at the patch I noticed a couple of issues, one small, the other a bit bigger. The small issue is that table 9.10 offers this syntax diagram for regexp_replace: regexp_replace ( string text, pattern text, replacement text [, start integer ] [, flags text ] ) → text This implies that it's valid to write regexp_replace (string, pattern, replacement, start, flags) but it is not: we have no function matching that signature. I'm not in a hurry to add one, either, for fear of ambiguity against the other regexp_replace signature. I think this needs to be broken into two syntax diagrams: regexp_replace ( string text, pattern text, replacement text [, start integer ] ) → text regexp_replace ( string text, pattern text, replacement text [, flags text ] ) → text The larger issue is that contrib/citext offers versions of some of these functions that are meant to be drop-in replacements using citext input. Hence, we need to add the same parameter names to those functions, or they'll fail to replace some calls. regards, tom lane
On Fri, Jul 19, 2024 at 5:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > jian he <jian.universality@gmail.com> writes: > > [ v5-0001-add-regex-functions-argument-names-to-pg_proc.patch ] > > I'm not sure whether we've bikeshedded this to death yet, but > personally I'm content with the naming choices here (which basically > are those already shown in table 9.10). However, while looking > at the patch I noticed a couple of issues, one small, the other > a bit bigger. > > The small issue is that table 9.10 offers this syntax diagram > for regexp_replace: > > regexp_replace ( string text, pattern text, replacement text [, start integer ] [, flags text ] ) → text > > This implies that it's valid to write > > regexp_replace (string, pattern, replacement, start, flags) > > but it is not: we have no function matching that signature. I'm not > in a hurry to add one, either, for fear of ambiguity against the other > regexp_replace signature. I think this needs to be broken into two > syntax diagrams: > > regexp_replace ( string text, pattern text, replacement text [, start integer ] ) → text > regexp_replace ( string text, pattern text, replacement text [, flags text ] ) → text We can list them separately. regexp_replace(string, pattern, replacement [, start]) regexp_replace(string, pattern, replacement [, flags]) regexp_replace(string, pattern, replacement , start , N [, flags ]). if both optional is not there then they are the same, list 2 potential identical functions separately seems wrong? so i choose 2 bracket with a vertical bar: regexp_replace(string, pattern, replacement [[, start] | [, flags]]). maybe less readable. > The larger issue is that contrib/citext offers versions of some of > these functions that are meant to be drop-in replacements using > citext input. Hence, we need to add the same parameter names to > those functions, or they'll fail to replace some calls. > I first wanted to use alterfunction solve this, then found out it cannot, later I found out CREATE OR REPLACE FUNCTION saved us. citext module, these functions: regexp_match() regexp_matches() regexp_replace() regexp_split_to_array() regexp_split_to_table() were created in contrib/citext/citext--1.4.sql, we can add the CREATE OR REPLACE FUNCTION to 1.4.sql. but to avoid unintended consequences I just add these to the newly created file citext--1.6--1.7.sql, to make a version bump.
Attachment
jian he <jian.universality@gmail.com> writes: > On Fri, Jul 19, 2024 at 5:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The larger issue is that contrib/citext offers versions of some of >> these functions that are meant to be drop-in replacements using >> citext input. Hence, we need to add the same parameter names to >> those functions, or they'll fail to replace some calls. > citext module, these functions: > regexp_match() > regexp_matches() > regexp_replace() > regexp_split_to_array() > regexp_split_to_table() > were created in contrib/citext/citext--1.4.sql, we can add the CREATE > OR REPLACE FUNCTION to 1.4.sql. > but to avoid unintended consequences I just add these to the newly > created file citext--1.6--1.7.sql, > to make a version bump. Yes. You *have to* do it like that, the shortcut is not an option, because without an extension update script there is no way to upgrade an existing installation to the new definition. Basically, once we ship a given release of an extension, that script is frozen in amber. I haven't heard any further bikeshedding on the argument names, so I'll move forward with committing this soon. regards, tom lane
On 15.07.24 16:52, Chapman Flack wrote: > On 07/15/24 10:46, Chapman Flack wrote: >> Ah, I may have mistaken which functions the patch meant to apply to. >> ... >> Any choice to use similar argument names in the regexp_* functions would >> be a matter of consistency with the analogous ISO functions, not anything >> mandated. > > Or, looking back, I might have realized these were the non-ISO regexp_* > functions, but seen there was bikeshedding happening over the best name > to use for the occurrence argument, and merely suggested ISO's choice > OCCURRENCE for the analogous ISO functions, as a possible bikeshed > accelerator. These functions were copied from Oracle, so one argument was to use the names from Oracle as-is.
I wrote: > I haven't heard any further bikeshedding on the argument names, > so I'll move forward with committing this soon. Pushed, after a little further fooling with the documentation. regards, tom lane
On Fri, Jul 19, 2024 at 5:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The small issue is that table 9.10 offers this syntax diagram > for regexp_replace: > > regexp_replace ( string text, pattern text, replacement text [, start integer ] [, flags text ] ) → text > > This implies that it's valid to write > > regexp_replace (string, pattern, replacement, start, flags) > > but it is not: we have no function matching that signature. I'm not > in a hurry to add one, either, for fear of ambiguity against the other > regexp_replace signature. I think this needs to be broken into two > syntax diagrams: > > regexp_replace ( string text, pattern text, replacement text [, start integer ] ) → text > regexp_replace ( string text, pattern text, replacement text [, flags text ] ) → text > this problem is still there, after commit 580f8727ca93b7b9a2ce49746b9cdbcb0a2b4a7e. << It has the syntax regexp_replace(string, pattern, replacement [, start [, N ]] [, flags ]). (Notice that N cannot be specified unless start is, but flags can be given in any case.) << doc, the above part still needs change? see my posts: https://postgr.es/m/CACJufxE5p4KhGyBUwCZCxhxdU%2BzJBXy2deX4u85SL%2Bkew4F7Cw%40mail.gmail.com
jian he <jian.universality@gmail.com> writes: > On Fri, Jul 19, 2024 at 5:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> but it is not: we have no function matching that signature. I'm not >> in a hurry to add one, either, for fear of ambiguity against the other >> regexp_replace signature. I think this needs to be broken into two >> syntax diagrams: > this problem is still there, after commit > 580f8727ca93b7b9a2ce49746b9cdbcb0a2b4a7e. No, I believe I fixed it: the table now offers regexp_replace ( string text, pattern text, replacement text [, flags text ] ) → text regexp_replace ( string text, pattern text, replacement text, start integer [, N integer [, flags text ] ] ) → text That's different from either of the solutions discussed in this thread, but simpler. > << > It has the syntax regexp_replace(string, pattern, replacement [, start > [, N ]] [, flags ]). (Notice that N cannot be specified unless start > is, but flags can be given in any case.) > << > doc, the above part still needs change? AFAICS, that one is correct, so I left it alone. (I didn't try to merge the table's two entries into one like that, though.) regards, tom lane
On Fri, Jul 26, 2024 at 10:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > << > > It has the syntax regexp_replace(string, pattern, replacement [, start > > [, N ]] [, flags ]). (Notice that N cannot be specified unless start > > is, but flags can be given in any case.) > > << > > doc, the above part still needs change? > > AFAICS, that one is correct, so I left it alone. (I didn't try to > merge the table's two entries into one like that, though.) > functions-string.html output is correct. but in functions-matching.html regexp_replace(string, pattern, replacement [, start [, N ]] [, flags ]). can represent regexp_replace(string, pattern, replacement , start, flags ) ? but we don't have "regexp_replace(string, pattern, replacement , start, flags )"
jian he <jian.universality@gmail.com> writes: > On Fri, Jul 26, 2024 at 10:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> AFAICS, that one is correct, so I left it alone. (I didn't try to >> merge the table's two entries into one like that, though.) > regexp_replace(string, pattern, replacement [, start [, N ]] [, flags ]). > can represent > regexp_replace(string, pattern, replacement , start, flags ) ? Hmm, yeah, you're right. I didn't want to write two separate synopses there, but maybe there's no choice. regards, tom lane
On Fri, Jul 26, 2024 at 10:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > jian he <jian.universality@gmail.com> writes: > > On Fri, Jul 26, 2024 at 10:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> AFAICS, that one is correct, so I left it alone. (I didn't try to > >> merge the table's two entries into one like that, though.) > > > regexp_replace(string, pattern, replacement [, start [, N ]] [, flags ]). > > > can represent > > > regexp_replace(string, pattern, replacement , start, flags ) ? > > Hmm, yeah, you're right. I didn't want to write two separate > synopses there, but maybe there's no choice. > we can get rid of: (Notice that <replaceable>N</replaceable> cannot be specified unless <replaceable>start</replaceable> is, but <replaceable>flags</replaceable> can be given in any case.) Now the output is It has the syntax regexp_replace(string, pattern, replacement [, flags ]) and regexp_replace(string, pattern, replacement, start [, N [, flags ]]). I also decorated "[]" with "<optional>".
Attachment
jian he <jian.universality@gmail.com> writes: > On Fri, Jul 26, 2024 at 10:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hmm, yeah, you're right. I didn't want to write two separate >> synopses there, but maybe there's no choice. > Now the output is > It has the syntax regexp_replace(string, pattern, replacement [, flags > ]) and regexp_replace(string, pattern, replacement, start [, N [, > flags ]]). Pushed, thanks. regards, tom lane