Home > mailing lists

Re: Locale-dependent case conversion in {identifier} - Mailing list pgsql-hackers

From	Nicolai Tufar
Subject	Re: Locale-dependent case conversion in {identifier}
Date	November 30, 2002 02:56:45
Msg-id	3DE86F78.9000905@apb.com.tr Whole thread Raw
In response to	7.4 Wishlist ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
Responses	Re: Locale-dependent case conversion in {identifier} Re: Locale-dependent case conversion in {identifier}
List	pgsql-hackers

Tree view

By no means I would try to convince that your reading of
the SQL standards is wrong. What I am trying to tell is
that Turkish alphabet is broken beyond repair. And since
there is absolutely no way to change our alphabet, we
may can code a workaround in the code.

So i do not claim that your code is wrong. It is
behaviang according to specification. But unfortunately
folks at SQL99 probably were not aware of the woes
of Turkish "I".

The very special case of letter "I" in Turkish is not
only PostgreSQL's problem. Many java programs have
failed miserably trying to open files with "I"s in
pathnames.

So basically, there are two letters "I" in Trukish.
The wone is with dot on top and another is without.
The with dot on top walways has the dot and the one
without never has it. Simple. The problem is
with the standard Latin "I". So why small "i" does
have a dot and capital "I" does not?

Standard conversion is
Lower: "I" -> "y'" and "Y'" -> "i".
Upper: "y'"  -> "I" and "i" -> "Y'".
(font may not be displayed correctly in your mail reader)

Historically programs that operate in Turkish locale have
chosen to hardcode the capitalisation of "i" in system
messages and identifier names like this:

Lower: "I" -> "i" and "Y'" -> "i".
Upper: "y'"  -> "I" and "i" -> "I".

With this, no matter what kind of "I" you used in names,
it is always going to end up a valid ASCII character.

Would it be acceptable if I submit a path that applies this
special logic in src/backend/parser/scan.l if the locale is "tr_TR"?

Because for many folks setting locale to Turkish would
render their database unusable. For, god forbid, if your
sql has a column name written in capitlas including "I".
It is not working. So I deeply believe that PostgreSQL community
have to provide a workaround for this problem.

So what should I do?

Best regards,
Nick

Tom Lane wrote:
> "Nicolai Tufar" <ntufar@apb.com.tr> writes:
> 
>>So I have changed lower-case conversion code in scan.l to make it purely
>>ASCII-based.
>>as in keywords.c. Mini-patch is given below.
> 
> 
> Rather than offering a patch, you need to convince us why our reading of
> the SQL standard is wrong.  ("Oracle does it that way" is not an
> argument that will carry a lot of weight.)
> 
> SQL99 states that identifier case conversions are done on the basis of
> the Unicode upper/lower case equivalences, so it seems clear that they
> intend more than ASCII-only conversion for identifiers.  Locale-based
> conversion might not be an exact implementation of the spec, but it's
> surely closer than ASCII-only.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

pgsql-hackers by date:

From: Alvaro Herrera
Date: 30 November 2002, 00:55:04
Subject: Re: 7.4 Wishlist

From: Neil Conway
Date: 30 November 2002, 03:06:23
Subject: Re: 7.4 Wishlist

Re: Locale-dependent case conversion in {identifier} - Mailing list pgsql-hackers

Previous

Next