Re: [SQL] Internationalisation: SELECT str (ignoring Umlauts/Accents) - Mailing list pgsql-sql
From | Benedikt Eric Heinen |
---|---|
Subject | Re: [SQL] Internationalisation: SELECT str (ignoring Umlauts/Accents) |
Date | |
Msg-id | Pine.LNX.3.96.980617210549.30824C-100000@fenun.icemark.ch Whole thread Raw |
In response to | Re: [SQL] Internationalisation: SELECT str (ignoring Umlauts/Accents) (Patrice Hédé <patrice@idf.net>) |
Responses |
Re: [SQL] Internationalisation: SELECT str (ignoring Umlauts/Accents)
|
List | pgsql-sql |
> Do you mean you have a field with German *and* French *and* Italian *and* > English words in it, and you want people, be they german-, french-, > italian-, english-speaking, to be able to access this field, without > putting accents and all ? Right - basically, I am building a web database with addresses of a group of people all over Switzerland, who are members of the same club. The problem is just, that for a Mr. "á Porta" I (can't speak French or Italian) doesn't know what the right spelling with accents is. Which is much the same way that a French native speaker of the western part of Switzerland possibly doesn't know which/whether an Umlaut will have to be used in a German name... > As I said earlier, you may have problems, since `ae' doesn't mean `ä' for > most of these people (except the german-speaking ones), and they may put > `a' instead. As the rules are different among the languages, it's > difficult to have a single solution. However, you *need* a solution. > Maybe I, or others ;) , may help though. Some questions : what is your > interface language (if it's perl, it can be much easier :) ) ? Can it be a > client-side solution, or do you absolutely need a server-side one (which > would then have to be a C function, I think) ? The program is a server-side C++ CGI (Can't program perl). I just thought - I am certainly not the first to have had this kind of problem... > And then, what kind of conversions do you need ? For example, for French, > I decided that all a, e, i, o, u, y to be equal, which meant : > > any of a,A,à,À,æ,Æ,å,Å,â,Â,á,Á,ä,Ä => a,A,à,À,æ,Æ,å,Å,â,Â,á,Á,ä,Ä > etc. Let's say - only just the search string should ever be modified, so an "ä" in the search string should never match "ae" in a string in the database. The modifications should be: part of search string can match in database side string a a, a umlaut, a with acute/grave/circumflex accent ae ae, a umlaut c c, c cedilla e e, e with acute/grave/circumflex accent i i, i with acute/grave/circumflex accent o o, o umlaut, o with acute/grave/circumflex accent oe oe, o umlaut u u, u umlaut, u with acute/grave/circumflex accent ue ue, u umlaut [all searches will be case insensitive] > Obviously, in your case, it will be more complex, since `ae' *may* have a > special meaning... (that's where it's getting difficult :( )... I hope the above description is somewhat useful to you (unfortunately I am lacking the matching characters on my US keyboard - so I described which ones should be matched). I guess, the ideal way would be to try and build a general pluggable module for postgresql, so that it can handle this somewhat transparently. Benedikt Windows 95: n. 32-bit extensions and a graphical shell for a 16-bit patch to an 8-bit operating system originally coded for a 4-bit microprocessor, written by a 2-bit company that can't stand for 1 bit of competition.