Re: Searching for "bare" letters - Mailing list pgsql-general
| From | Oleg Bartunov |
|---|---|
| Subject | Re: Searching for "bare" letters |
| Date | |
| Msg-id | Pine.LNX.4.64.1110021333280.26195@sn.sai.msu.ru Whole thread Raw |
| In response to | Re: Searching for "bare" letters (Uwe Schroeder <uwe@oss4u.com>) |
| Responses |
Re: Searching for "bare" letters
|
| List | pgsql-general |
I don't see the problem - you can have a dictionary, which does all work on
recognizing bare letters and output several versions. Have you seen unaccent
dictionary ?
Oleg
On Sun, 2 Oct 2011, Uwe Schroeder wrote:
>> Hi, everyone. Uwe wrote:
>>> What kind of "client" are the users using? I assume you will have some
>>> kind of user interface. For me this is a typical job for a user
>>> interface. The number of letters with "equivalents" in different
>>> languages are extremely limited, so a simple matching routine in the
>>> user interface should give you a way to issue the proper query.
>>
>> The user interface will be via a Web application. But we need to store
>> the data with the European characters, such as ?, so that we can display
>> them appropriately. So much as I like your suggestion, we need to do
>> the opposite of what you're saying -- namely, take a bare letter, and
>> then search for letters with accents and such on them.
>>
>> I am beginning to think that storing two versions of each name, one bare
>> and the other not, might be the easiest way to go. But hey, I'm open
>> to more suggestions.
>>
>> Reuven
>
>
> That still doesn't hinder you from using a matching algorithm. Here a simple
> example (to my understanding of the problem)
> You have texts stored in the db both containing a n and a ?. Now a client
> enters "n" on the website. What you want to do is look for both variations, so
> "n" translates into "n" or "?".
> There you have it. In the routine that receives the request you have a
> matching method that matches on "n" (or any of the few other characters with
> equivalents) and the routine will issue a query with a "xx like "%n%" or xx
> like "%?%" (personally I would use ilike, since that eliminates the case
> problem).
>
> Since you're referring to a "name", I sure don't know the specifics of the
> problem or data layout, but by what I know I think you can tackle this with a
> rather primitive "match -> translate to" kind of algorithm.
>
> One thing I'd not do: store duplicate versions. There's always a way to deal
> with data the way it is. In my opinion storing different versions of the same
> data just bloats a database in favor of a smarter way to deal with the initial
> data.
>
> Uwe
>
>
>
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: