Thread: Invalid EUC_TW character sequence found
Recently, I have installed the version 7.2.1 to my Redhat 6.1 server with the following configure: ./configure --prefix=/usr/local/pgsql --enable-multibyte=EUC_TW --with-perl --with-python --with-tcl --enable-odbc After the installation, I have tried to restore some of my old databases from version 7.0.2 but in vain owing to invalid character found. Then I have tried to input some chinese character (big 5) directly, It gave me some errors as shown below from the pgAdmin II: 2002-06-25 17:20:13 - SQL (AccessControl): UPDATE "site" SET "cname" = '» ´ä ¦r' WHERE "siteid" = '001' AND "name" = 'this is HK' AND "cname" = '» ´ä' 2002-06-25 17:20:13 - ******************************************************************* 2002-06-25 17:20:13 - Error 2002-06-25 17:20:13 - ******************************************************************* 2002-06-25 17:20:13 - Error in pgAdmin II:frmSQLOutput.cmdSave_Click: -2147467259 - ERROR: Invalid EUC_TW character sequence found (0xa672) If there is any compatibility issue between the old version and the new one. Best Regards Gene Leung
> -2147467259 - ERROR: Invalid EUC_TW character sequence found (0xa672) The error message says all. You had invalid data (maybe raw Big5 data?) in your database. (1) If you are sure you have raw Big5 data in the old database, convert them to EUC_TW then load them. (2) If you have EUC_TW and Big5 mixed data, then you have a serious problem. You probably have to fix the the dump data by hand. -- Tatsuo Ishii
> The second way to confirm version 7.2.1 can not accept chinese input is to > create a new database with the following command: > > CREATE DATABASE "test" WITH ENCODING = 'EUC_TW'; > > then create table site (name varchar(50)); and insert data directly with > pgAdmin II, it gives error as follows: > > -2147467259 - ERROR: Invalid EUC_TW character sequence found (0xa672) 0xa672 cannot be a correct EUC_TW character. Check your application. -- Tatsuo Ishii
> To me, the third insert is a character that display correctly in my application, > I do not see any problem. And I do not know and can not tell how to check that > 'xx' is not a correct ECU_TW character. Please give me some hint for checking, > thanks!! Ok, here are some rules to verify EUC_TW characters: (1) if the first byte is 0x8e, then the 8th bit of following three bytes must be set (2) else if the first byte is 0x8f, then the 8th bit of following two bytes must be set (3) else if the 8th bit of the first byte is set, then the 8th bit of following one bytes must be set (4) else (that means the 8th bit of the first byte is not set) then that must be an ASCII character. Apparently 0xa672 does not satisfy all of above. -- Tatsuo Ishii
Hi Tatsuoi, Thanks for your quick response. Actually I tried both way (1. dump and restore, 2. create a new database in version 7.2.1) but in vain. The first way is to dump a database from 7.0.2 database containing EUC_TW data List of databases Database | Owner | Encoding ---------------+----------+---------- AccessControl | postgres | EUC_TW The old database was created by the EUC_TW encoding. It works fine with the chinese characters for version 7.0.2. However when I follow the instruction to do the upgrade with restore to my redhat 6.1, it gives error such as Invalid EUC_TW character sequence found. Then I search for the news group, with "Invalid EUC_TW character sequence found", a guy named Gordon Luk has the same problem as me. Actually he is my friend, originally I thought it may be the problem of Redhat 7.3 with postgresql pre-installed. So I decided to try with the tar file and did the installation to Redhat 6.1. The second way to confirm version 7.2.1 can not accept chinese input is to create a new database with the following command: CREATE DATABASE "test" WITH ENCODING = 'EUC_TW'; then create table site (name varchar(50)); and insert data directly with pgAdmin II, it gives error as follows: 2002-06-26 09:22:28 - SQL (test): INSERT INTO "site" ("name") VALUES ('»´ä¦r') 2002-06-26 09:22:28 - ******************************************************************* 2002-06-26 09:22:28 - Error 2002-06-26 09:22:28 - ******************************************************************* 2002-06-26 09:22:28 - Error in pgAdmin II:frmSQLOutput.cmdSave_Click: -2147467259 - ERROR: Invalid EUC_TW character sequence found (0xa672) 2002-06-26 09:22:28 - Windows Version: Windows 2000 v5.0 build 2195 Service Pack 2 2002-06-26 09:22:28 - pgSchema Version: 1.2.0 2002-06-26 09:22:28 - MDAC Version: 2.5 2002-06-26 09:22:28 - DBMS Version: 07.02.0001 PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66 2002-06-26 09:22:28 - Connection String (Master Connection): Provider=MSDASQL.1;Extended Properties="DRIVER={PostgreSQL};DATABASE=template1;SERVER=sql;PORT=5432;UID=harry;PWD=********;ReadOnly=0;Protocol=6.4;FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0;ShowSystemTables=0;ConnSettings=;Fetch=100;Socket=4096;UnknownSizes=0;MaxVarcharSize=254;MaxLongVarcharSize=65536;Debug=0;CommLog=0;Optimizer=1;Ksqo=1;UseDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=1;BoolsAsChar=1;Parse=0;CancelAsFreeStmt=0;ExtraSysTablePrefixes=dd_;LFConversion=1;UpdatableCursors=1;DisallowPremature=0;TrueIsMinus1=0" If the coming version can not support chinese, it may be a big problem for a lot of people. As a database user myself, we do not have much knowledge about those encoding stuff. And we have to rely on you guys. You guys have already done a lot of good things to the open source. Just keep on searching the best. Thanks! Best Regards Gene Leung Tatsuo Ishii wrote: > > -2147467259 - ERROR: Invalid EUC_TW character sequence found (0xa672) > > The error message says all. You had invalid data (maybe raw Big5 > data?) in your database. > > (1) If you are sure you have raw Big5 data in the old database, > convert them to EUC_TW then load them. > > (2) If you have EUC_TW and Big5 mixed data, then you have a serious > problem. You probably have to fix the the dump data by hand. > -- > Tatsuo Ishii
Not all chinese characters can not be input to the application, only some of them, 2002-06-26 11:12:32 - SQL (test): INSERT INTO "site" ("name") VALUES ('¬ü¥úµó') 2002-06-26 11:12:47 - SQL (test): INSERT INTO "site" ("name") VALUES ('¥«µó¥«') 2002-06-26 11:14:42 - SQL (test): INSERT INTO "site" ("name") VALUES ('¦r') 2002-06-26 11:14:42 - ******************************************************************* 2002-06-26 11:14:42 - Error 2002-06-26 11:14:42 - ******************************************************************* 2002-06-26 11:14:42 - Error in pgAdmin II:frmSQLOutput.cmdSave_Click: -2147467259 - ERROR: Invalid EUC_TW character sequence found (0xa672) 2002-06-26 11:14:42 - Windows Version: Windows 2000 v5.0 build 2195 Service Pack 2 2002-06-26 11:14:42 - pgSchema Version: 1.2.0 2002-06-26 11:14:42 - MDAC Version: 2.5 2002-06-26 11:14:42 - DBMS Version: 07.02.0001 PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66 2002-06-26 11:14:42 - Connection String (Master Connection): Provider=MSDASQL.1;Extended Properties="DRIVER={PostgreSQL};DATABASE=template1;SERVER=sql;PORT=5432;UID=harry;PWD=********;ReadOnly=0;Protocol=6.4;FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0;ShowSystemTables=0;ConnSettings=;Fetch=100;Socket=4096;UnknownSizes=0;MaxVarcharSize=254;MaxLongVarcharSize=65536;Debug=0;CommLog=0;Optimizer=1;Ksqo=1;UseDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=1;BoolsAsChar=1;Parse=0;CancelAsFreeStmt=0;ExtraSysTablePrefixes=dd_;LFConversion=1;UpdatableCursors=1;DisallowPremature=0;TrueIsMinus1=0" From the above, I inserted three rows to the table using the pgAdmin II , the first two without any problem except the last one. I even confirm these input with the psql on the server side. Same result applies the above problem. To me, the third insert is a character that display correctly in my application, I do not see any problem. And I do not know and can not tell how to check that '¦r' is not a correct ECU_TW character. Please give me some hint for checking, thanks!! Best Regards Gene Leung Tatsuo Ishii wrote: > > The second way to confirm version 7.2.1 can not accept chinese input is to > > create a new database with the following command: > > > > CREATE DATABASE "test" WITH ENCODING = 'EUC_TW'; > > > > then create table site (name varchar(50)); and insert data directly with > > pgAdmin II, it gives error as follows: > > > > -2147467259 - ERROR: Invalid EUC_TW character sequence found (0xa672) > > 0xa672 cannot be a correct EUC_TW character. Check your application. > -- > Tatsuo Ishii