Home > mailing lists

Re: Corruption of multibyte identifiers on UTF-8 locale - Mailing list pgsql-bugs

From	Tom Lane
Subject	Re: Corruption of multibyte identifiers on UTF-8 locale
Date	September 23, 2006 13:36:54
Msg-id	25540.1159029401@sss.pgh.pa.us Whole thread Raw
In response to	Corruption of multibyte identifiers on UTF-8 locale (Victor Snezhko <snezhko@indorsoft.ru>)
Responses	Re: Corruption of multibyte identifiers on UTF-8 locale
List	pgsql-bugs

Tree view

Victor Snezhko <snezhko@indorsoft.ru> writes:
> correct utf-8 byte sequence is 0xd18231, so it looks like we call
> tolower() somewhere on parts of multibyte characters, and it does the
> same as isspace() - it interprets it's argument as wide character, and
> converts it.

Indeed, and I am certainly wondering why we should not just say that
you've got a broken locale definition there.  There is absolutely no
doubt that the ctype.h functions are defined to work on char, not wchar.
They have no business mangling high-bit-set bytes in a multibyte
encoding.

            regards, tom lane

pgsql-bugs by date:

From: Victor Snezhko
Date: 23 September 2006, 13:03:20
Subject: Re: Corruption of multibyte identifiers on UTF-8 locale

From: Victor Snezhko
Date: 23 September 2006, 14:34:07
Subject: Re: Corruption of multibyte identifiers on UTF-8 locale

Re: Corruption of multibyte identifiers on UTF-8 locale - Mailing list pgsql-bugs

Previous

Next