Re: BUG #13440: unaccent does not remove all diacritics - Mailing list pgsql-bugs

From Peter Eisentraut
Subject Re: BUG #13440: unaccent does not remove all diacritics
Date
Msg-id 5589642C.3000201@gmx.net
Whole thread Raw
In response to Re: BUG #13440: unaccent does not remove all diacritics  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-bugs
On 6/18/15 5:17 PM, Alvaro Herrera wrote:
> To me, conceptually what unaccent does is turn whatever junk you have
> into a very basic common alphabet (ascii); then it's very easy to do
> full text searches without having to worry about what accents the people
> did or did not use in their searches.  If we say "okay, but that funny
> char is not an accent so let's leave it alone" then the charter doesn't
> sound so useful to me.

I think unaccent is one of those contrib things that are useful but not
really fully thought out and therefore won't ever become an official
core feature.  It is what it is, and we can tweak it slightly, but
thinking too hard about what it "should" do won't lead anywhere.

If we wanted to do this "properly", we could do something like: perform
Unicode canonical decomposition, then strip out all combining
characters.  I don't know how useful that is in practice, though.  And
it won't "solve" issues such as German ß, which probably doesn't have a
one-size-fits-all solution.

pgsql-bugs by date:

Previous
From: Марк Коренберг
Date:
Subject: Re: BUG #13462: Impossible to use COPY FORMAT BINARY in chunks through libpq
Next
From: nanaska_91@mail.ru
Date:
Subject: BUG #13463: fatal 28000 no pg_hba.conf entry for host