Thread: change regexp_substr first argument make tests more easier to understand.

change regexp_substr first argument make tests more easier to understand.

From
jian he
Date:
hi.
https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/expected/strings.out#n928

SELECT regexp_substr('abcabcabc', 'a.c');
SELECT regexp_substr('abcabcabc', 'a.c', 2);
SELECT regexp_substr('abcabcabc', 'a.c', 1, 3);
SELECT regexp_substr('abcabcabc', 'a.c', 1, 4) IS NULL AS t;
SELECT regexp_substr('abcabcabc', 'A.C', 1, 2, 'i');

they all return 'abc', there are 3 'abc ' in  string 'abcabcabc'
except IS NULL query.
maybe we can change regexp_substr first argument from "abcabcabc" to
"abcaXcaYc".
so the result would be more easier to understand.



On Thu, Dec 28, 2023 at 12:13 AM jian he <jian.universality@gmail.com> wrote:
>
> hi.
> https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/expected/strings.out#n928
>
> SELECT regexp_substr('abcabcabc', 'a.c');
> SELECT regexp_substr('abcabcabc', 'a.c', 2);
> SELECT regexp_substr('abcabcabc', 'a.c', 1, 3);
> SELECT regexp_substr('abcabcabc', 'a.c', 1, 4) IS NULL AS t;
> SELECT regexp_substr('abcabcabc', 'A.C', 1, 2, 'i');
>
> they all return 'abc', there are 3 'abc ' in  string 'abcabcabc'
> except IS NULL query.
> maybe we can change regexp_substr first argument from "abcabcabc" to
> "abcaXcaYc".
> so the result would be more easier to understand.

anyway here is the minor patch to  change string from "abcabcabc" to
"abcaXcaYc".

Attachment
Hi,

If I understand correctly, the problem is that it's not clear which of
the 'abc' substrings is matched/returned by the function, right?

I wonder if this is a problem only for understanding the test, or if it
makes the tests a bit weaker. I mean, what if the function returns the
wrong substring? How would we know?

Also, if we tweak this, shouldn't we tweak also the regext_instr() calls
a bit earlier in the test script?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Fri, Jul 19, 2024 at 5:49 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Hi,
>
> If I understand correctly, the problem is that it's not clear which of
> the 'abc' substrings is matched/returned by the function, right?
>
> I wonder if this is a problem only for understanding the test, or if it
> makes the tests a bit weaker. I mean, what if the function returns the
> wrong substring? How would we know?
>

this is for understanding the test.
personally, sometimes, I feel the documentation is too dry, hard to follow.
so i can based on regress tests better understand the documentation.
that was my intention for the changes.


we have more sophisticated regex test at
https://git.postgresql.org/cgit/postgresql.git/tree/src/test/modules/test_regex

> Also, if we tweak this, shouldn't we tweak also the regext_instr() calls
> a bit earlier in the test script?
>
sure.
please check attached.

Attachment

Re: change regexp_substr first argument make tests more easier to understand.

From
Ilia Evdokimov
Date:
Hi everybody

Current tests with regexp_instr() and regexp_substr()  with string 
'abcabcabc' are really unreadable and you would spend time to understand 
that happens in these tests and if they are really correct. I'd better 
change them into "abcdefghi" just like in query

     SELECT regexp_substr('abcdefghi', 'd.q') IS NULL AS t;

Regards

Ilia Evdokimov,
Tantor Labs LLC.






Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:
> Current tests with regexp_instr() and regexp_substr()  with string 
> 'abcabcabc' are really unreadable and you would spend time to understand 
> that happens in these tests and if they are really correct. I'd better 
> change them into "abcdefghi" just like in query

>      SELECT regexp_substr('abcdefghi', 'd.q') IS NULL AS t;

On looking more closely at these test cases, I think the point of them
is exactly to show the behavior of the functions with multiple copies
of the target substring.  Thus, what Jian is proposing breaks the
tests: it's no longer perfectly clear whether the result is because
the function did what we expect, or because the pattern failed to
match anywhere else.  (Sure, "a.c" *should* match "aXc", but if it
didn't, you wouldn't discover that from this test.)  What Ilia
proposes would break them worse.

I think we should just reject this patch, or at least reject the
parts of it that change existing test cases.  I have no opinion
about whether the new test cases add anything useful.

            regards, tom lane