Re: Case-Insensitve Text Comparison - Mailing list pgsql-hackers
From | Oleg Bartunov |
---|---|
Subject | Re: Case-Insensitve Text Comparison |
Date | |
Msg-id | Pine.LNX.4.64.0806020919181.21547@sn.sai.msu.ru Whole thread Raw |
In response to | Case-Insensitve Text Comparison ("David E. Wheeler" <david@kineticode.com>) |
Responses |
Re: Case-Insensitve Text Comparison
|
List | pgsql-hackers |
David, we wrote contrib module (mchar) for one customer, which ports its application from mssql to postgres. It does case-insensitive comparison for new data type 'mchar' and linked with ICU for system independent locale. Oleg On Sun, 1 Jun 2008, David E. Wheeler wrote: > Howdy, > > I'm sure I'm just showing off my ignorance here, but here goes > > I really need case-insensitive string comparison in my database. Ideally > there'd be a nice ITEXT data type (and friends, ichar, ivarchar, etc.). But > of course there isn't, and for years I've just used LOWER() on indexes and > queries to get the same result. > > Only it turns out that I'm of course not getting the same result. This > script: > > #!/usr/local/bin/perl -w > > use strict; > use warnings; > use utf8; > binmode STDOUT, ':utf8'; > use DBI; > > my $dbh = DBI->connect('dbi:Pg:dbname=try', 'postgres', '', { pg_enable_utf8 > => 1 }); > for my $char qw( A B C D ) { > print "$char: ", $dbh->selectrow_array('SELECT LOWER(?)', undef, $char ), > $/; > } > > Yields this output: > > : > : > : > : > : > : > : > : > : > : > : > : > : > A: a > B: b > C: c > D: d > > So it doesn't really work on anything other than ASCII, it looks like. So I > have two questions: > > 1. Does the use of the tolower() C function in the citext data type on > pgfoundry basically give me the same results as using lower() in my SQL has > for all these years? IOW, does it convert letters to lowercase in the same > way that the LOWER() SQL function does? If so, I think I might start to use > it for my case-insensitive columns and simplify my SQL a bit. > > http://pgfoundry.org/projects/citext/ > > 2. Isn't the ICU library distributed with PostgreSQL? And if so, could it not > be used to create proper case conversions in LOWER() and friends and, > ultimately, to create a case-insensitive text type in core? I'm seeing that > it has a constant named U_COMPARE_IGNORE_CASE that can be used with its > unorm_compare() function: > > http://www.icu-project.org/apiref/icu4c/unorm_8h.html#6cc4c8b0d5be7ce1ac4b600ace7817f5 > http://www.icu-project.org/apiref/icu4c/unorm_8h.html#991e0fe6f0d062dd6e8e924517f3f437 > > I don't really know C, but if that's stuff there, can't we take advantage of > it for proper case-insensitive comparisons (and conversions)? > > Thanks, > > David > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: