Thread: BUG #15069: group by after regexp_replace
The following bug has been logged on the website: Bug reference: 15069 Logged by: Ilhwan Ko Email address: koglep@gmail.com PostgreSQL version: 9.6.7 Operating system: macOS 10.12 Description: create table test_t (a varchar(10), b varchar (20), c int); insert into test_t (a,b,c) values ('a', 'a', 1); insert into test_t (a,b,c) values (' a', 'a', 2); insert into test_t (a,b,c) values (' a', 'a', 4); insert into test_t (a,b,c) values (' a', 'a', 8); insert into test_t (a,b,c) values ('b', 'b', 1); insert into test_t (a,b,c) values (' b', 'b', 2); insert into test_t (a,b,c) values (' b', 'b', 4); insert into test_t (a,b,c) values (' b', 'b', 8); select keyword, sum(t.c) as cc from ( select upper(regexp_replace(a, '\\s+', '')) as keyword, c from test_t ) t group by keyword; select upper(regexp_replace(a, '\\s+', '')) as keyword, sum(c) from test_t group by upper(regexp_replace(a, '\\s+', '')); select keyword, sum(t.c) as cc from ( select upper(regexp_replace(b, '\\s+', '')) as keyword, c from test_t ) t group by keyword; select upper(regexp_replace(b, '\\s+', '')) as keyword, sum(c) from test_t group by upper(regexp_replace(b, '\\s+', '')); I expected to get the same results regarding to above four queries. However, they were different.
The following bug has been logged on the website:
Bug reference: 15069
Logged by: Ilhwan Ko
Email address: koglep@gmail.com
PostgreSQL version: 9.6.7
Operating system: macOS 10.12
Description:
select upper(regexp_replace(b, '\\s+', '')) as keyword, sum(c)
from test_t
group by upper(regexp_replace(b, '\\s+', ''));
I expected to get the same results regarding to above four queries.
However, they were different.
Not a bug - you mis-understand string literal syntax and escaping. What you are asking to replace is "a backslash followed by one or more "s"es.
If you want to leave the literal as-is you need to write:
E'\\s+'
The way I prefer is to keep the "E" omitted and write:
'\s+'
Without the "E" the backslash is not an escape character in a PostgreSQL literal and so the backslash in the regex doesn't need to be protected. By protecting it you are actually protecting the backslash in front of the "s" thus causing it to become two separate "symbols", "\" and "s" - and the + then applies to the literal "s".
David J.
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes: PG> select upper(regexp_replace(a, '\\s+', '')) as keyword This regexp doesn't do what you seem to be expecting. With standard_conforming_strings enabled, which is the default, \ is not a special character to the string literal parser, so the regexp engine is seeing a regexp of \\s+ which means "match the literal character \ followed by one or more lowercase "s" characters". Accordingly the regexp never matches on your input. To remove whitespace use '\s+' or E'\\s+' or '[[:space:]]+' -- Andrew (irc:RhodiumToad)