Re: Case-Insensitve Text Comparison - Mailing list pgsql-hackers
| From | Oleg Bartunov |
|---|---|
| Subject | Re: Case-Insensitve Text Comparison |
| Date | |
| Msg-id | Pine.LNX.4.64.0806020919181.21547@sn.sai.msu.ru Whole thread Raw |
| In response to | Case-Insensitve Text Comparison ("David E. Wheeler" <david@kineticode.com>) |
| Responses |
Re: Case-Insensitve Text Comparison
|
| List | pgsql-hackers |
David,
we wrote contrib module (mchar) for one customer, which ports its
application from mssql to postgres. It does case-insensitive comparison
for new data type 'mchar' and linked with ICU for system independent locale.
Oleg
On Sun, 1 Jun 2008, David E. Wheeler wrote:
> Howdy,
>
> I'm sure I'm just showing off my ignorance here, but here goes
>
> I really need case-insensitive string comparison in my database. Ideally
> there'd be a nice ITEXT data type (and friends, ichar, ivarchar, etc.). But
> of course there isn't, and for years I've just used LOWER() on indexes and
> queries to get the same result.
>
> Only it turns out that I'm of course not getting the same result. This
> script:
>
> #!/usr/local/bin/perl -w
>
> use strict;
> use warnings;
> use utf8;
> binmode STDOUT, ':utf8';
> use DBI;
>
> my $dbh = DBI->connect('dbi:Pg:dbname=try', 'postgres', '', { pg_enable_utf8
> => 1 });
> for my $char qw( A B C D ) {
> print "$char: ", $dbh->selectrow_array('SELECT LOWER(?)', undef, $char ),
> $/;
> }
>
> Yields this output:
>
> :
> :
> :
> :
> :
> :
> :
> :
> :
> :
> :
> :
> :
> A: a
> B: b
> C: c
> D: d
>
> So it doesn't really work on anything other than ASCII, it looks like. So I
> have two questions:
>
> 1. Does the use of the tolower() C function in the citext data type on
> pgfoundry basically give me the same results as using lower() in my SQL has
> for all these years? IOW, does it convert letters to lowercase in the same
> way that the LOWER() SQL function does? If so, I think I might start to use
> it for my case-insensitive columns and simplify my SQL a bit.
>
> http://pgfoundry.org/projects/citext/
>
> 2. Isn't the ICU library distributed with PostgreSQL? And if so, could it not
> be used to create proper case conversions in LOWER() and friends and,
> ultimately, to create a case-insensitive text type in core? I'm seeing that
> it has a constant named U_COMPARE_IGNORE_CASE that can be used with its
> unorm_compare() function:
>
> http://www.icu-project.org/apiref/icu4c/unorm_8h.html#6cc4c8b0d5be7ce1ac4b600ace7817f5
> http://www.icu-project.org/apiref/icu4c/unorm_8h.html#991e0fe6f0d062dd6e8e924517f3f437
>
> I don't really know C, but if that's stuff there, can't we take advantage of
> it for proper case-insensitive comparisons (and conversions)?
>
> Thanks,
>
> David
>
>
>
Regards, Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: