Re: again: Bug #943: Server-Encoding from EUC_TW - Mailing list pgsql-hackers
From | Enke, Michael |
---|---|
Subject | Re: again: Bug #943: Server-Encoding from EUC_TW |
Date | |
Msg-id | 3EF89848.E00F0EF@wincor-nixdorf.com Whole thread Raw |
In response to | again: Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn'twork ("Enke, Michael" <michael.enke@wincor-nixdorf.com>) |
Responses |
Re: again: Bug #943: Server-Encoding from EUC_TW
|
List | pgsql-hackers |
Tatsuo Ishii wrote: > > > > > I reported bug #943 (I found in 7.3.2) and you checked in some change against integer overflow. > > > > Now I upgraded to 7.3.3 and I'm not happy with this. > > > > The exact error as I described is fixed, but I found new errors in conversion UTF-8 <-> EUC_TW and BIG5: > > > > > > > > Copy to table (DB has UTF-8 encoding) from file: > > > > for PGCLIENTENCODING=BIG5: > > > > WARNING: copy: line 1, LocalToUtf: could not convert (0xf9d6) BIG5 to UTF-8. Ignored > > > > WARNING: copy: line 2, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored > > > > WARNING: copy: line 3, LocalToUtf: could not convert (0xf9d8) BIG5 to UTF-8. Ignored > > > > WARNING: copy: line 4, LocalToUtf: could not convert (0xf9db) BIG5 to UTF-8. Ignored > > > > > > I see no problem here. The only standard conversion map I could found > > > on-line form so far (see below URL) does not include entries 0xf9d6 or > > > above. > > > > > > http://www.unicode.org/Public/UNIDATA/Unihan.txt > > > > > > I found in this file: > > U+F9D7 in line 604519 > > U+F9D8 in line 219540 > > U+F9D6...U+F9DB in lines 730707...730766. > > No. U+F9D6 means *Unicode* code point, not BIG5 code point. Ok. I have looked into my Linux box and found this in /usr/share/i18n/charmaps/BIG5.gz: % Chinese charmap for BIG5 (CP950) % version: 0.92 % Contact: Tung-Han Hsieh <thhsieh@linux.org.tw> % Yuan-Chung Cheng <platin@ms31.hinet.net> % Distribution and use is free, even for comercial purpose. % % This charmap is converted from: % ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT % ... There "my" characters are in. Don't you agree that it is strange that I can (for EUC_TW) copy "to" file without error but I can not copy "from" file without error? Michael > > > > > > for EUC_TW > > > > WARNING: copy: line 1, LocalToUtf: could not convert (0x8ea3c3b7) EUC_TW to UTF-8. Ignored > > > > WARNING: copy: line 2, LocalToUtf: could not convert (0x8ea3cfd0) EUC_TW to UTF-8. Ignored > > > > WARNING: copy: line 3, LocalToUtf: could not convert (0x8ea3c4ce) EUC_TW to UTF-8. Ignored > > > > WARNING: copy: line 4, LocalToUtf: could not convert (0x8ea3bdfe) EUC_TW to UTF-8. Ignored > > > > > > Hum. These seem to be CNS 11643-1993, plane 3. Currently PostgreSQL > > > supports only: > > > > > > CNS 11643-1993, plane 0 > > > CNS 11643-1993, plane 1 > > > CNS 11643-1993, plane 2 > > > CNS 11643-1993, plane 15 > > > > > > Would you like to have support for rest of CNS 11643-1993 planes: > > > > > > CNS 11643-1993, plane 3 > > > CNS 11643-1993, plane 4 > > > CNS 11643-1993, plane 5 > > > CNS 11643-1993, plane 6 > > > CNS 11643-1993, plane 7 > > > > > > support for upcoming 7.4? > > > > > > > Copy out to file from table (UTF-8 data): > > > > to BIG5 > > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe7a281). Ignored > > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe98ab9). Ignored > > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe8a38f). Ignored > > > > WARNING: UtfToLocal: could not convert UTF-8 (0xe7b2a7). Ignored > > > > > > > > to EUC_TW is ok! > > > > > > BIG5 and EUC_TW have different code points. So this is not very strange. > > > > > > But it is very strange that I can (for EUC_TW) copy to file without error but I can not copy from file without error. > > > > Michael > >
pgsql-hackers by date: