Re: Understanding Encoding - Mailing list pgsql-sql
From | Sebastien FLAESCH |
---|---|
Subject | Re: Understanding Encoding |
Date | |
Msg-id | 52298A02.4060607@4js.com Whole thread Raw |
In response to | Re: Understanding Encoding (Tatsuo Ishii <ishii@postgresql.org>) |
Responses |
Re: Understanding Encoding
|
List | pgsql-sql |
Hi, Tip: To identify what encoding you enter in the psql command interpreter: 1) Open a file with vim 2) Type in you SQL or copy/paste 3) Save the file and quit vim 4) $ file <filename> Should give you the encoding of that text file. For ex: sf@orca:~$ echo $LC_ALL en_US.UTF-8 sf@orca:~$ cat /tmp/xx abcdefé sf@orca:~$ file /tmp/xx /tmp/xx: UTF-8 Unicode text Seb On 09/06/2013 09:03 AM, Tatsuo Ishii wrote: >> Hello All, >> >> I am not able to understand how the encoding is handled. I would be happy >> if someone can tell what is happening in the following scenario: >> >> 1. I have created a database with EUC_KR encoding and created a table and >> inserted some korean value into it. >> >> =# CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' >> LC_CTYPE='ko_KR.euckr' TEMPLATE=template0; >> >> =# \c korean >> >> korean=# SHOW client_encoding; >> client_encoding >> ----------------- >> UTF8 >> (1 row) >> >> korean=# CREATE TABLE tbl (doc text); >> >> korean=# INSERT INTO tbl VALUES ('그레스'); >> >> >> 2. If I insert non-korean values it throws error: >> >> korean=# INSERT INTO tbl VALUES ('データベース'); >> ERROR: character with byte sequence 0xe3 0x83 0xbc in encoding "UTF8" has >> no equivalent in encoding "EUC_KR" > > The error messages says all. PostgreSQL accepted 'データベース' > encoded in UTF-8 then tried to convert to EUC_KR but failed, because > EUC_KR does not accept languages other than Korean (and ASCII). What > else did you expect? > >> korean=# SELECT * FROM tbl; >> doc >> -------- >> 그레스 >> (1 row) >> >> >> 3. I change the client encoding to EUC_KR and try inserting the same korean >> characters and it throws an error: >> >> korean=# SET client_encoding = 'EUC_KR'; >> SET >> korean=# INSERT INTO tbl VALUES ('그레스'); >> ERROR: invalid byte sequence for encoding "EUC_KR": 0xa0 0x88 > > 0xa0 is definitely not part of EUC_KR. That's why PostgreSQL throws an > error. I gues you are using UHC (Unified Hangul Code), rather than > EUC_KR. They are different encodings. You should do either: > > 1) Make sure that your termical encoding is EUC_KR. > > 2) set client_encoding = 'uhc'; > >> Even the SELECT statement displays something different. I am not able to >> understand why? >> >> korean=# SELECT * FROM tbl; >> doc >> -------- >> ���� >> (1 row) > > This is because the same reason above. > >> Can someone please help me. >> >> Thanks you, >> >> Beena Emerson > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese: http://www.sraoss.co.jp >