Re: BUG #15892: URGENT: Using an ICU collation in a primary key column breaks ILIKE query - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: BUG #15892: URGENT: Using an ICU collation in a primary key column breaks ILIKE query |
Date | |
Msg-id | 8498.1565559169@sss.pgh.pa.us Whole thread Raw |
In response to | Re: BUG #15892: URGENT: Using an ICU collation in a primary keycolumn breaks ILIKE query (Arthur Zakirov <a.zakirov@postgrespro.ru>) |
Responses |
Re: BUG #15892: URGENT: Using an ICU collation in a primary keycolumn breaks ILIKE query
|
List | pgsql-bugs |
Arthur Zakirov <a.zakirov@postgrespro.ru> writes: > On 04.07.2019 19:50, Daniel Verite wrote: >> Yes. It is because of the index that the code checks if the ILIKE >> can be evaluated with an index lookup. Otherwise it doesn't. >> If you feel like recompiling with a temporary fix against v11.4 to do >> your own tests, please try the attached patch. > Also this patch works. I thinks we need to use some ICU function to > check whether a char is a letter. This topic seems to have slipped through the cracks, which is too bad because we surely could have gotten it fixed in time for last week's releases. Oh well. I think Daniel's patch leaves more on the table than it needs to, and I don't like Arthur's patch because it's assuming more than it should about whether it's okay to pass a "char" to u_isalpha(). I propose that we do things as attached, instead. (I also added a regression test case.) regards, tom lane diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c index 26e0634..c8fec78 100644 --- a/src/backend/utils/adt/like_support.c +++ b/src/backend/utils/adt/like_support.c @@ -1437,8 +1437,9 @@ regex_selectivity(const char *patt, int pattlen, bool case_insensitive, * Check whether char is a letter (and, hence, subject to case-folding) * * In multibyte character sets or with ICU, we can't use isalpha, and it does - * not seem worth trying to convert to wchar_t to use iswalpha. Instead, just - * assume any multibyte char is potentially case-varying. + * not seem worth trying to convert to wchar_t to use iswalpha or u_isalpha. + * Instead, just assume any non-ASCII char is potentially case-varying, and + * hard-wire knowledge of which ASCII chars are letters. */ static int pattern_char_isalpha(char c, bool is_multibyte, @@ -1449,7 +1450,8 @@ pattern_char_isalpha(char c, bool is_multibyte, else if (is_multibyte && IS_HIGHBIT_SET(c)) return true; else if (locale && locale->provider == COLLPROVIDER_ICU) - return IS_HIGHBIT_SET(c) ? true : false; + return IS_HIGHBIT_SET(c) || + (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'); #ifdef HAVE_LOCALE_T else if (locale && locale->provider == COLLPROVIDER_LIBC) return isalpha_l((unsigned char) c, locale->info.lt); diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out index 51262e0..50d1e6e 100644 --- a/src/test/regress/expected/collate.icu.utf8.out +++ b/src/test/regress/expected/collate.icu.utf8.out @@ -983,6 +983,38 @@ SELECT relname, pg_get_indexdef(oid) FROM pg_class WHERE relname LIKE 'collate_t collate_test1_idx4 | CREATE INDEX collate_test1_idx4 ON collate_tests.collate_test1 USING btree (((b || 'foo'::text)) COLLATE"POSIX") (4 rows) +set enable_seqscan = off; +explain (costs off) +select * from collate_test1 where b ilike 'abc'; + QUERY PLAN +------------------------------- + Seq Scan on collate_test1 + Filter: (b ~~* 'abc'::text) +(2 rows) + +select * from collate_test1 where b ilike 'abc'; + a | b +---+----- + 1 | abc + 4 | ABC +(2 rows) + +explain (costs off) +select * from collate_test1 where b ilike 'ABC'; + QUERY PLAN +------------------------------- + Seq Scan on collate_test1 + Filter: (b ~~* 'ABC'::text) +(2 rows) + +select * from collate_test1 where b ilike 'ABC'; + a | b +---+----- + 1 | abc + 4 | ABC +(2 rows) + +reset enable_seqscan; -- schema manipulation commands CREATE ROLE regress_test_role; CREATE SCHEMA test_schema; diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql index 46999fb..5ed6360 100644 --- a/src/test/regress/sql/collate.icu.utf8.sql +++ b/src/test/regress/sql/collate.icu.utf8.sql @@ -341,6 +341,15 @@ CREATE INDEX collate_test1_idx6 ON collate_test1 ((a COLLATE "C")); -- fail SELECT relname, pg_get_indexdef(oid) FROM pg_class WHERE relname LIKE 'collate_test%_idx%' ORDER BY 1; +set enable_seqscan = off; +explain (costs off) +select * from collate_test1 where b ilike 'abc'; +select * from collate_test1 where b ilike 'abc'; +explain (costs off) +select * from collate_test1 where b ilike 'ABC'; +select * from collate_test1 where b ilike 'ABC'; +reset enable_seqscan; + -- schema manipulation commands
pgsql-bugs by date: