Minor regexp bug - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Minor regexp bug |
Date | |
Msg-id | 26464.1446863568@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Minor regexp bug
Re: Minor regexp bug |
List | pgsql-hackers |
Happened across this while investigating something else ... The regexp documentation says: Lookahead and lookbehind constraints cannot contain <firstterm>back references</> (see <xref linkend="posix-escape-sequences">), and all parentheses within them are considered non-capturing. This is true if you try a simple case, eg regression=# select 'xyz' ~ 'x(\w)(?=\1)'; ERROR: invalid regular expression: invalid backreference number (That's not the greatest choice of error code, perhaps, but the point is it rejects the backref.) However, stick an extra set of parentheses in there, and the regexp parser forgets all about the restriction: regression=# select 'xyz' ~ 'x(\w)(?=(\1))'; ?column? ---------- t (1 row) Since the execution machinery has no hope of executing such a backref properly, you silently get a wrong answer. (The actual behavior in a case like this is as though the backref had been replaced by a copy of the bit of regexp pattern it'd referred to, ie this pattern works like 'x(\w)(?=\w)', without any enforcement that the two \w's are matching identical text.) The fix is a one-liner, as per the attached patch: when recursing to parse the innards of a parenthesized subexpression, we have to pass down the current "type" not necessarily PLAIN, so that any type-related restrictions still apply inside the subexpression. What I'm wondering about is whether to back-patch this. It's possible that people have written patterns like this and not realized that they aren't doing quite what's expected. Getting a failure instead might not be desirable in a minor release. On the other hand, wrong answers are wrong answers. Thoughts? regards, tom lane diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c index aa759c2..a165b3b 100644 *** a/src/backend/regex/regcomp.c --- b/src/backend/regex/regcomp.c *************** parseqatom(struct vars * v, *** 951,957 **** EMPTYARC(lp, s); EMPTYARC(s2, rp); NOERR(); ! atom = parse(v, ')', PLAIN, s, s2); assert(SEE(')') || ISERR()); NEXT(); NOERR(); --- 951,957 ---- EMPTYARC(lp, s); EMPTYARC(s2, rp); NOERR(); ! atom = parse(v, ')', type, s, s2); assert(SEE(')') || ISERR()); NEXT(); NOERR(); diff --git a/src/test/regress/expected/regex.out b/src/test/regress/expected/regex.out index f0e2fc9..07fb023 100644 *** a/src/test/regress/expected/regex.out --- b/src/test/regress/expected/regex.out *************** select 'a' ~ '()+\1'; *** 490,492 **** --- 490,497 ---- t (1 row) + -- Error conditions + select 'xyz' ~ 'x(\w)(?=\1)'; -- no backrefs in LACONs + ERROR: invalid regular expression: invalid backreference number + select 'xyz' ~ 'x(\w)(?=(\1))'; + ERROR: invalid regular expression: invalid backreference number diff --git a/src/test/regress/sql/regex.sql b/src/test/regress/sql/regex.sql index d3030af..c45bdc9 100644 *** a/src/test/regress/sql/regex.sql --- b/src/test/regress/sql/regex.sql *************** select 'a' ~ '$()|^\1'; *** 117,119 **** --- 117,123 ---- select 'a' ~ '.. ()|\1'; select 'a' ~ '()*\1'; select 'a' ~ '()+\1'; + + -- Error conditions + select 'xyz' ~ 'x(\w)(?=\1)'; -- no backrefs in LACONs + select 'xyz' ~ 'x(\w)(?=(\1))';
pgsql-hackers by date: