Thread: change regexp_substr first argument make tests more easier to understand.
hi. https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/expected/strings.out#n928 SELECT regexp_substr('abcabcabc', 'a.c'); SELECT regexp_substr('abcabcabc', 'a.c', 2); SELECT regexp_substr('abcabcabc', 'a.c', 1, 3); SELECT regexp_substr('abcabcabc', 'a.c', 1, 4) IS NULL AS t; SELECT regexp_substr('abcabcabc', 'A.C', 1, 2, 'i'); they all return 'abc', there are 3 'abc ' in string 'abcabcabc' except IS NULL query. maybe we can change regexp_substr first argument from "abcabcabc" to "abcaXcaYc". so the result would be more easier to understand.
On Thu, Dec 28, 2023 at 12:13 AM jian he <jian.universality@gmail.com> wrote: > > hi. > https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/expected/strings.out#n928 > > SELECT regexp_substr('abcabcabc', 'a.c'); > SELECT regexp_substr('abcabcabc', 'a.c', 2); > SELECT regexp_substr('abcabcabc', 'a.c', 1, 3); > SELECT regexp_substr('abcabcabc', 'a.c', 1, 4) IS NULL AS t; > SELECT regexp_substr('abcabcabc', 'A.C', 1, 2, 'i'); > > they all return 'abc', there are 3 'abc ' in string 'abcabcabc' > except IS NULL query. > maybe we can change regexp_substr first argument from "abcabcabc" to > "abcaXcaYc". > so the result would be more easier to understand. anyway here is the minor patch to change string from "abcabcabc" to "abcaXcaYc".
Attachment
Re: change regexp_substr first argument make tests more easier to understand.
From
Tomas Vondra
Date:
Hi, If I understand correctly, the problem is that it's not clear which of the 'abc' substrings is matched/returned by the function, right? I wonder if this is a problem only for understanding the test, or if it makes the tests a bit weaker. I mean, what if the function returns the wrong substring? How would we know? Also, if we tweak this, shouldn't we tweak also the regext_instr() calls a bit earlier in the test script? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jul 19, 2024 at 5:49 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Hi, > > If I understand correctly, the problem is that it's not clear which of > the 'abc' substrings is matched/returned by the function, right? > > I wonder if this is a problem only for understanding the test, or if it > makes the tests a bit weaker. I mean, what if the function returns the > wrong substring? How would we know? > this is for understanding the test. personally, sometimes, I feel the documentation is too dry, hard to follow. so i can based on regress tests better understand the documentation. that was my intention for the changes. we have more sophisticated regex test at https://git.postgresql.org/cgit/postgresql.git/tree/src/test/modules/test_regex > Also, if we tweak this, shouldn't we tweak also the regext_instr() calls > a bit earlier in the test script? > sure. please check attached.
Attachment
Re: change regexp_substr first argument make tests more easier to understand.
From
Ilia Evdokimov
Date:
Hi everybody Current tests with regexp_instr() and regexp_substr() with string 'abcabcabc' are really unreadable and you would spend time to understand that happens in these tests and if they are really correct. I'd better change them into "abcdefghi" just like in query SELECT regexp_substr('abcdefghi', 'd.q') IS NULL AS t; Regards Ilia Evdokimov, Tantor Labs LLC.
Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes: > Current tests with regexp_instr() and regexp_substr() with string > 'abcabcabc' are really unreadable and you would spend time to understand > that happens in these tests and if they are really correct. I'd better > change them into "abcdefghi" just like in query > SELECT regexp_substr('abcdefghi', 'd.q') IS NULL AS t; On looking more closely at these test cases, I think the point of them is exactly to show the behavior of the functions with multiple copies of the target substring. Thus, what Jian is proposing breaks the tests: it's no longer perfectly clear whether the result is because the function did what we expect, or because the pattern failed to match anywhere else. (Sure, "a.c" *should* match "aXc", but if it didn't, you wouldn't discover that from this test.) What Ilia proposes would break them worse. I think we should just reject this patch, or at least reject the parts of it that change existing test cases. I have no opinion about whether the new test cases add anything useful. regards, tom lane