Thread: Cyrillic to UNICODE conversion
Despite of advertized support of Unicode to other charset conversion, PostgreSQL-7.1 reports that Conversion of UNICODE to KOI8 is not supported. Same for WIN, ALT and other charsets. As I found out, it was simply forgotten to add these charsets to list of 8-bit charsets which should be converted. May be becouse their maps are stored in another directory on ftp.unicode.org (see VENDORS/MicroSoft for cp1251 and cp866 maps, and somewhere else for KOI8-R.TXT. At least all those maps are included in the catdoc distribution) Attached patch fixes this problem. It adds script UCS_to_cyrillic.pl into src/backend/utils/mb/Unicode directory. Mapping of the PostgreSQL charset names to filenames (as they appear in catdoc distribution, i.e. lowercased) is hardcoded into script. It is almost exact copy of UCS_to_iso script, with only file and constant names changed. Generated maps are included in the patch, as they are included in the source tarball, and maps are omitted, becouse they are removed by make distclean file src/backend/mb/conv.c is modified to include new maps and provide appropriate conversion functions -- Victor Wagner vitus@ice.ru Chief Technical Officer Office:7-(095)-748-53-88 Communiware.Net Home: 7-(095)-135-46-61 http://www.communiware.net http://www.ice.ru/~vitus
Attachment
Thanks for the fixes. I have committed your patches and they should appear in 7.1.1. BTW, I have not added cp1251.txt cp866.txt koi8-r.txt, since they come from Unicode.org and are not permitted to re-distribute. -- Tatsuo Ishii From: Victor Wagner <vitus@ice.ru> Subject: [PATCHES] Cyrillic to UNICODE conversion Date: Thu, 26 Apr 2001 20:51:25 +0400 (MSD) Message-ID: <Pine.LNX.4.30.0104262041500.9539-101000@party.ice.ru> > > Despite of advertized support of Unicode to other charset conversion, > PostgreSQL-7.1 reports that Conversion of UNICODE to KOI8 is not > supported. Same for WIN, ALT and other charsets. > > As I found out, it was simply forgotten to add these charsets to list > of 8-bit charsets which should be converted. May be becouse their maps > are stored in another directory on ftp.unicode.org (see VENDORS/MicroSoft > for cp1251 and cp866 maps, and somewhere else for KOI8-R.TXT. At least all > those maps are included in the catdoc distribution) > > Attached patch fixes this problem. It adds script UCS_to_cyrillic.pl > into src/backend/utils/mb/Unicode directory. Mapping of the PostgreSQL > charset names to filenames (as they appear in catdoc distribution, i.e. > lowercased) is hardcoded into script. It is almost exact copy of > UCS_to_iso script, with only file and constant names changed. > > Generated maps are included in the patch, as they are included in the > source tarball, and maps are omitted, becouse they are removed by > make distclean > > file src/backend/mb/conv.c is modified > to include new maps and provide appropriate conversion functions > > > > -- > Victor Wagner vitus@ice.ru > Chief Technical Officer Office:7-(095)-748-53-88 > Communiware.Net Home: 7-(095)-135-46-61 > http://www.communiware.net http://www.ice.ru/~vitus
> > BTW, I have not added cp1251.txt cp866.txt koi8-r.txt, since they > > come from Unicode.org and are not permitted to re-distribute. > > It is not true for koi8-r.txt. At least one which is included into catdoc > distribution I've made myself from RFC1483, and only afterward it has > appear on unicode.org, and Chernov's KOI8 pages. Oh, I didn't know that. > But anyway, if anybody > is able to get them from unicode.org, why bother. Agreed. -- Tatsuo Ishii
On Sun, 29 Apr 2001, Tatsuo Ishii wrote: > From: Tatsuo Ishii <t-ishii@sra.co.jp> > Subject: Re: [PATCHES] Cyrillic to UNICODE conversion > X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.1 > [iso-2022-jp] (^[$B0*^[(B) > > Thanks for the fixes. I have committed your patches and they should > appear in 7.1.1. > > BTW, I have not added cp1251.txt cp866.txt koi8-r.txt, since they > come from Unicode.org and are not permitted to re-distribute. It is not true for koi8-r.txt. At least one which is included into catdoc distribution I've made myself from RFC1483, and only afterward it has appear on unicode.org, and Chernov's KOI8 pages. But anyway, if anybody is able to get them from unicode.org, why bother. -- Victor Wagner vitus@ice.ru Chief Technical Officer Office:7-(095)-748-53-88 Communiware.Net Home: 7-(095)-135-46-61 http://www.communiware.net http://www.ice.ru/~vitus