Thread: BUG #2592: ILIKE does not care about locales
The following bug has been logged online: Bug reference: 2592 Logged by: Robert Siemer Email address: Robert.Siemer-postgresql.org@backsla.sh PostgreSQL version: 8.1.4 Operating system: Linux Description: ILIKE does not care about locales Details: Hi! As I don't want to risk getting things mixed up here in this very report, lets assume: s and t are strings with one 'international' character, one having the lower case the other upper lower(s) LIKE lower(t) yields True, as it should s ILIKE t yields False --> I expect True I tried this with LC_COLLATE=C and the rest LC_...=es_ES.utf8 dennisb from irc reported LC_everything=sv_SE.UTF-8 with version 8.1.0 having the same problems. Some "non-normative" examples for s and t: http://rafb.net/paste/results/bMRfez77.html and ä Ã, ñ Ã, ö Ã
On Sun, Aug 27, 2006 at 12:58:00PM +0000, Robert Siemer wrote: > Bug reference: 2592 > Logged by: Robert Siemer > Email address: Robert.Siemer-postgresql.org@backsla.sh > PostgreSQL version: 8.1.4 > Operating system: Linux > Description: ILIKE does not care about locales > Details: > > Hi! > > As I don't want to risk getting things mixed up here in this very report, > lets assume: > s and t are strings with one 'international' character, one having the lower > case the other upper > > lower(s) LIKE lower(t) yields True, as it should > > s ILIKE t yields False --> I expect True > > > I tried this with LC_COLLATE=C and the rest LC_...=es_ES.utf8 > > dennisb from irc reported LC_everything=sv_SE.UTF-8 with version 8.1.0 > having the same problems. > > Some "non-normative" examples for s and t: > http://rafb.net/paste/results/bMRfez77.html > and ä Ä, ñ Ñ, ö Ö I can confirm this with de_DE.utf8. We currently initialize all our PostgreSQL database clusters like this: unset LANG export LC_ALL=POSIX initdb --encoding="UNICODE" --lc-collate="de_DE.utf8" --lc-ctype="de_DE@euro" Only this way, ILIKE and "ORDER BY" work as expected. I don't know about upper() and lower() though; I only tested ILIKE and ORDER BY. Tino. PS: Is there a place to search bugs? I couldn't find one (apart from the mailing list) last time I was troubleshooting some problem.
Tino Schwarze <tino.schwarze@tisc.de> writes: > On Sun, Aug 27, 2006 at 12:58:00PM +0000, Robert Siemer wrote: >> I tried this with LC_COLLATE=C and the rest LC_...=es_ES.utf8 >> >> dennisb from irc reported LC_everything=sv_SE.UTF-8 with version 8.1.0 >> having the same problems. > I can confirm this with de_DE.utf8. ilike currently doesn't work for multibyte encodings (eg utf8). This bug has been known for a long while, eg, http://archives.postgresql.org/pgsql-bugs/2005-10/msg00002.php but no one's stepped up to fix it. regards, tom lane
On Tue, Aug 29, 2006 at 09:27:59AM -0400, Tom Lane wrote: > Tino Schwarze <tino.schwarze@tisc.de> writes: > > On Sun, Aug 27, 2006 at 12:58:00PM +0000, Robert Siemer wrote: > >> I tried this with LC_COLLATE=C and the rest LC_...=es_ES.utf8 > >> > >> dennisb from irc reported LC_everything=sv_SE.UTF-8 with version 8.1.0 > >> having the same problems. > > > I can confirm this with de_DE.utf8. > > ilike currently doesn't work for multibyte encodings (eg utf8). This > bug has been known for a long while, eg, > > http://archives.postgresql.org/pgsql-bugs/2005-10/msg00002.php > > but no one's stepped up to fix it. The "use lower() for both strings" solution sounds reasonably simple to me, but I'm not familiar with PgSQLs sources... The funny thing is: It currently works, if you set the locale to de_DE (non-UTF8)! pg_controldata output: LC_COLLATE: de_DE.utf8 LC_CTYPE: de_DE@euro Output from test database: test=# select * from test where test ilike '%ä%'; test --------- äbcd Äbbcd bläbbcd BLÄbbcd (4 rows) test=# select * from test where test ilike '%Ä%'; test --------- äbcd Äbbcd bläbbcd BLÄbbcd (4 rows) -> same result, both upper and lower ä match. test=# select * from test where test ilike '%ö%'; test ------ ö Ö (2 rows) test=# select * from test where test ilike '%Ö%'; test ------ ö Ö (2 rows) test=# select * from test order by test desc; test --------- Ü ü Ö ö MÜLLER müller BLÄbbcd bläbbcd afbcd aebcd äbcd abcd Äbbcd aabcd (14 rows) Everything is fine with this weird locale setting. (All other locales are set to C)... Bye, Tino.