Version 17.6 changed how similar works compared to version 17.5 - Mailing list pgsql-bugs

From Stephan Springl
Subject Version 17.6 changed how similar works compared to version 17.5
Date
Msg-id 41a37137-f8bb-8fc5-2948-31b528f166dc@bfw-online.de
Whole thread Raw
Responses Re: Version 17.6 changed how similar works compared to version 17.5
Re: Version 17.6 changed how similar works compared to version 17.5
List pgsql-bugs
Hello,

version 17.6 changed how similar works compared to version 17.5.

With file f as
cat >f <<END
drop table t;
create table t (p varchar (1));
insert into t values ('_');
select * from t;
select * from t where p similar to '[\_]%';
END

psql -f f

gives:

DROP TABLE
CREATE TABLE
INSERT 0 1
  p 
---
  _
(1 row)

  p 
---
(0 rows)

The expression with similar does not find the row. With version 17.5, the row
was found, as wanted.

Reverting commit e3ffc3e91d04579240fb54a96f9059b246488dce
"Fix conversion of SIMILAR TO regexes for character classes"
brings back the previous behavior.  The patch does not take account of the
first character in a character class being escaped.  In this case it skips
the closing ']' of the caracter class.  "[_]%" as similar expression gets
translated to "^(?:[\_]%)$" as a regular expression.  Version 17.5
generates "^(?:[\_].*)$" as regular expression.

I suggest a fix.  Unfortunately, I am not sure about what an escape in a
character class of a similar expression should mean and whether the escape
character should always be '\' (as the patch does it) or the escape value
given to the similar expression.  Branches REL_18_STABLE and master are
affectes as well.

Thank you for your great work on postgresql.
Regards,
Stephan

diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index 37ca136acf1..114fb43fd91 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -905,9 +905,41 @@ similar_escape_internal(text *pat_text, text *esc_text)
          }

          /* fast path */
-        if (afterescape)
+        if (charclass_depth > 0)
          {
-            if (pchar == '"' && charclass_depth < 1)    /* escape-double-quote? */
+            if (afterescape)
+            {
+                *r++ = '\\';
+                afterescape = false;
+            }
+            *r++ = pchar;
+
+            /*
+             * Ignore a closing bracket at the start of a character class.
+             * Such a bracket is taken literally rather than closing the
+             * class.  "charclass_start" is 1 right at the beginning of a
+             * class and 2 after an initial caret.
+             */
+            if (pchar == ']' && charclass_start > 2)
+                charclass_depth--;
+            else if (pchar == '[')
+                charclass_depth++;
+
+            /*
+             * If there is a caret right after the opening bracket, it negates
+             * the character class, but a following closing bracket should
+             * still be treated as a normal character.  That holds only for
+             * the first caret, so only the values 1 and 2 mean that closing
+             * brackets should be taken literally.
+             */
+            if (pchar == '^')
+                charclass_start++;
+            else
+                charclass_start = 3;    /* definitely past the start */
+        }
+        else if (afterescape)
+        {
+            if (pchar == '"')    /* escape-double-quote? */
              {
                  /* emit appropriate part separator, per notes above */
                  if (nquotes == 0)
@@ -956,35 +988,6 @@ similar_escape_internal(text *pat_text, text *esc_text)
              /* SQL escape character; do not send to output */
              afterescape = true;
          }
-        else if (charclass_depth > 0)
-        {
-            if (pchar == '\\')
-                *r++ = '\\';
-            *r++ = pchar;
-
-            /*
-             * Ignore a closing bracket at the start of a character class.
-             * Such a bracket is taken literally rather than closing the
-             * class.  "charclass_start" is 1 right at the beginning of a
-             * class and 2 after an initial caret.
-             */
-            if (pchar == ']' && charclass_start > 2)
-                charclass_depth--;
-            else if (pchar == '[')
-                charclass_depth++;
-
-            /*
-             * If there is a caret right after the opening bracket, it negates
-             * the character class, but a following closing bracket should
-             * still be treated as a normal character.  That holds only for
-             * the first caret, so only the values 1 and 2 mean that closing
-             * brackets should be taken literally.
-             */
-            if (pchar == '^')
-                charclass_start++;
-            else
-                charclass_start = 3;    /* definitely past the start */
-        }
          else if (pchar == '[')
          {
              /* start of a character class */




pgsql-bugs by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: BUG #19049: Assert failure when using skip arrays on an index key with DESC order
Next
From: Tushar Takate
Date:
Subject: ERROR: found xmin 4133102167 from before relfrozenxid 4151440783