Re: [BUGS] Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn'twork - Mailing list pgsql-hackers
From | Enke, Michael |
---|---|
Subject | Re: [BUGS] Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn'twork |
Date | |
Msg-id | 3E9AA746.2E07B899@wincor-nixdorf.com Whole thread Raw |
In response to | Re: [BUGS] Bug #943: Server-Encoding from EUC_TW to UTF-8 doesn't (Tatsuo Ishii <t-ishii@sra.co.jp>) |
Responses |
Re: [BUGS] Bug #943: Server-Encoding from EUC_TW to
|
List | pgsql-hackers |
I tried also BIG5 encoded data (Trad. Chinese for Taiwan) and got warnings: WARNING: copy: line 4586, LocalToUtf: could not convert (0xf9d7) BIG5 to UTF-8. Ignored ... Is this also solved with this fix? Michael Tatsuo Ishii wrote: > > It turned out that it's a bug with encoding conversion engine of > PostgreSQL. It just failed to find proper entry from a encoding > conversion table because of a integer overflow problem. Since only > maps for EUC_TW have such a huge code point values (for example > 0x8eaee7aa), I believe the conversion failure merely occurs with the > particular encoding. Included patches should solve the problem (it is > against PostgreSQL 7.3.2). > > BTW, I'm surprised to find the bug since it has been there since 7.2 > days. > > I'm going to commit the fix to both current and 7.3-stable trees. > -- > Tatsuo Ishii > > > Short Description > > Server-Encoding from EUC_TW to UTF-8 doesn't work > > > > Long Description > > System: SuSE Linux 8.1, kernel 2.4.19, glibc 2.2.5/glibc-locale 2.2.5 > > the same error on RedHat 7.3, kernel 2.4.20, glibc2.2.5 > > postgresql version 7.3.2 > > description: I loaded Chinese (TW) characters, encoded as UTF-8 into a > > database which has UTF-8 encoding with "copy table from 'original'" with psql. Ok. > > Than I exit from psql, exported PGCLIENTENCODING=EUC_TW > > I started psql, make a "copy table to 'file.EUC_TW'". Ok. > > If I convert this file to UTF-8 with iconv -f EUC-TW -t UTF-8 file.EUC_TW file.UTF-8 > > than file.UTF-8 looks ecaxtly the same as the original. > > That means, PostgreSQL converts from UTF-8 to EUC_TW correct. > > Now I load the exported file 'file.EUC_TW' back into DB: > > "copy table2 from 'file.EUC_TW'", still I did not finish psql, > > PGCLIENTENCODING is the same as for "copy to". > > Now I get error telling me: "copy: line 1, LocalToUtf: could not convert (0xe5b5) EUC_TW to UTF-8" ... and the charactersare missing in table2 > > > > Sample Code > > UTF-8: > > 00000000: e795 b6e6 97a5 0ae5 959f e58b 95e4 b8ad > > 00000010: 2ce4 bd86 e69c 89e9 8caf e8aa a40a > > > > EUC_TW as exported from PostgreSQL and not imported: > > 00000000: e5b5 c5ca 0ada f6d9 afc4 e32c c8fe c8b4 > > 00000010: f2e3 eba8 0a > > *** src/backend/utils/mb/conv.c.orig 2003-04-12 10:03:25.000000000 +0900 > --- src/backend/utils/mb/conv.c 2003-04-12 10:16:04.000000000 +0900 > *************** > *** 313,319 **** > > v1 = *(unsigned int *) p1; > v2 = ((pg_utf_to_local *) p2)->utf; > ! return (v1 - v2); > } > > /* > --- 313,319 ---- > > v1 = *(unsigned int *) p1; > v2 = ((pg_utf_to_local *) p2)->utf; > ! return (v1 > v2)?1:((v1 == v2)?0:-1); > } > > /* > *************** > *** 328,334 **** > > v1 = *(unsigned int *) p1; > v2 = ((pg_local_to_utf *) p2)->code; > ! return (v1 - v2); > } > > /* > --- 328,334 ---- > > v1 = *(unsigned int *) p1; > v2 = ((pg_local_to_utf *) p2)->code; > ! return (v1 > v2)?1:((v1 == v2)?0:-1); > } > > /*
pgsql-hackers by date: