Re: Multibyte still broken - Mailing list pgsql-hackers

From Michael Robinson
Subject Re: Multibyte still broken
Date
Msg-id 200005111756.BAA10220@netrinsics.com
Whole thread Raw
In response to Re: Multibyte still broken  (Tatsuo Ishii <t-ishii@sra.co.jp>)
List pgsql-hackers
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>I am supprised to hear that you have so poor quality tools that
>produce illegal code sequences of Simplified Chinese. In Japan, as far
>as I know, we never have such a low quality tools which generate
>illegal Japanese charaters just because they are not accepted in the
>market, even in the case of email attachments, or cut-and-past or
>whatever.

The problem is not that the tools produce "illegal characters".  The problem
is that, as an EUC code, GB permits the coexistance of standard ascii
characters with double-byte hanzi characters.  Furthermore, most Chinese 
software is an operating-system "hack" on top of English-language software
based on a Latin-1 character set (the Chinese software market is underserved
compared to Japan, so we have to cope as best we can).

The result is that it is possible to, for example, insert a carriage return
or ASCII comma into the middle of a hanzi, which breaks the alignment for all 
the hanzi on the rest of the line.  It's also possible, in non-native Chinese
applications, to select one byte of a hanzi character in a cut or copy 
operation.

So the problem is that the tools do not uniformly respect the integrity of
a double-byte hanzi character, but rather treat it as two individual Latin-1
characters.

The important point, though, is that all tools, whether native Chinese or
"hacked" English, accept the resulting invalid code sequences consistently,
robustly, and without complaint.
-Michael



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Orphaned locks in 7.0?
Next
From: Michael Robinson
Date:
Subject: Re: Multibyte still broken