Re: Locale-dependent case conversion in {identifier} - Mailing list pgsql-hackers
From | Nicolai Tufar |
---|---|
Subject | Re: Locale-dependent case conversion in {identifier} |
Date | |
Msg-id | 3DE86F78.9000905@apb.com.tr Whole thread Raw |
In response to | 7.4 Wishlist ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>) |
Responses |
Re: Locale-dependent case conversion in {identifier}
Re: Locale-dependent case conversion in {identifier} |
List | pgsql-hackers |
By no means I would try to convince that your reading of the SQL standards is wrong. What I am trying to tell is that Turkish alphabet is broken beyond repair. And since there is absolutely no way to change our alphabet, we may can code a workaround in the code. So i do not claim that your code is wrong. It is behaviang according to specification. But unfortunately folks at SQL99 probably were not aware of the woes of Turkish "I". The very special case of letter "I" in Turkish is not only PostgreSQL's problem. Many java programs have failed miserably trying to open files with "I"s in pathnames. So basically, there are two letters "I" in Trukish. The wone is with dot on top and another is without. The with dot on top walways has the dot and the one without never has it. Simple. The problem is with the standard Latin "I". So why small "i" does have a dot and capital "I" does not? Standard conversion is Lower: "I" -> "y'" and "Y'" -> "i". Upper: "y'" -> "I" and "i" -> "Y'". (font may not be displayed correctly in your mail reader) Historically programs that operate in Turkish locale have chosen to hardcode the capitalisation of "i" in system messages and identifier names like this: Lower: "I" -> "i" and "Y'" -> "i". Upper: "y'" -> "I" and "i" -> "I". With this, no matter what kind of "I" you used in names, it is always going to end up a valid ASCII character. Would it be acceptable if I submit a path that applies this special logic in src/backend/parser/scan.l if the locale is "tr_TR"? Because for many folks setting locale to Turkish would render their database unusable. For, god forbid, if your sql has a column name written in capitlas including "I". It is not working. So I deeply believe that PostgreSQL community have to provide a workaround for this problem. So what should I do? Best regards, Nick Tom Lane wrote: > "Nicolai Tufar" <ntufar@apb.com.tr> writes: > >>So I have changed lower-case conversion code in scan.l to make it purely >>ASCII-based. >>as in keywords.c. Mini-patch is given below. > > > Rather than offering a patch, you need to convince us why our reading of > the SQL standard is wrong. ("Oracle does it that way" is not an > argument that will carry a lot of weight.) > > SQL99 states that identifier case conversions are done on the basis of > the Unicode upper/lower case equivalences, so it seems clear that they > intend more than ASCII-only conversion for identifiers. Locale-based > conversion might not be an exact implementation of the spec, but it's > surely closer than ASCII-only. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
pgsql-hackers by date: