Re: Searching for "bare" letters - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: Searching for "bare" letters |
Date | |
Msg-id | Pine.LNX.4.64.1110021333280.26195@sn.sai.msu.ru Whole thread Raw |
In response to | Re: Searching for "bare" letters (Uwe Schroeder <uwe@oss4u.com>) |
Responses |
Re: Searching for "bare" letters
|
List | pgsql-general |
I don't see the problem - you can have a dictionary, which does all work on recognizing bare letters and output several versions. Have you seen unaccent dictionary ? Oleg On Sun, 2 Oct 2011, Uwe Schroeder wrote: >> Hi, everyone. Uwe wrote: >>> What kind of "client" are the users using? I assume you will have some >>> kind of user interface. For me this is a typical job for a user >>> interface. The number of letters with "equivalents" in different >>> languages are extremely limited, so a simple matching routine in the >>> user interface should give you a way to issue the proper query. >> >> The user interface will be via a Web application. But we need to store >> the data with the European characters, such as ?, so that we can display >> them appropriately. So much as I like your suggestion, we need to do >> the opposite of what you're saying -- namely, take a bare letter, and >> then search for letters with accents and such on them. >> >> I am beginning to think that storing two versions of each name, one bare >> and the other not, might be the easiest way to go. But hey, I'm open >> to more suggestions. >> >> Reuven > > > That still doesn't hinder you from using a matching algorithm. Here a simple > example (to my understanding of the problem) > You have texts stored in the db both containing a n and a ?. Now a client > enters "n" on the website. What you want to do is look for both variations, so > "n" translates into "n" or "?". > There you have it. In the routine that receives the request you have a > matching method that matches on "n" (or any of the few other characters with > equivalents) and the routine will issue a query with a "xx like "%n%" or xx > like "%?%" (personally I would use ilike, since that eliminates the case > problem). > > Since you're referring to a "name", I sure don't know the specifics of the > problem or data layout, but by what I know I think you can tackle this with a > rather primitive "match -> translate to" kind of algorithm. > > One thing I'd not do: store duplicate versions. There's always a way to deal > with data the way it is. In my opinion storing different versions of the same > data just bloats a database in favor of a smarter way to deal with the initial > data. > > Uwe > > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: