Thread: BUG #4257: about unicode extend
The following bug has been logged online: Bug reference: 4257 Logged by: arli weng Email address: program@163.com PostgreSQL version: 8.3 Operating system: gentoo linux Description: about unicode extend Details: the command (chinese by utf-8): INSERT INTO "title" VALUES(46307243,46307898,'é é¼ ðª¨'); in sqlite text type, no problem.. in postgres report error: invalid byte sequence for encoding "UNICODE": 0xf0 the 𪨠char is unicode extend b, by utf-8 format, the hex code is "f0 aa 95 a8", because unicode extend b, must start by 0xf0 but postgres cannot support it? server/database/client encoding has unicode already. help me pls, because i love postgres.. and sorry my english
"arli weng" <program@163.com> writes: > the command (chinese by utf-8): > INSERT INTO "title" VALUES(46307243,46307898,'é é¼ ðª¨'); > in postgres report error: > invalid byte sequence for encoding "UNICODE": 0xf0 I don't believe this is actually an 8.3 server. In 8.1 or later that encoding would be referred to as "UTF8"; also, 8.1 and later would show all bytes of the complained-of character not just the first one. 8.0 and before only support 16-bit Unicode code points (ie, 3-byte utf8 sequences). We have support for 4-byte sequences in 8.1 and later. Also, there were some fixes in this area in Jan 2007, so whichever branch you use, make sure you get a minor release that's newer than that. regards, tom lane
On Sat, Jun 21, 2008 at 01:25:15PM +0000, arli weng wrote: > PostgreSQL version: 8.3 What does "SELECT version()" return? I'm wondering if the server isn't 8.3 but rather an earlier version (see below). > the command (chinese by utf-8): > INSERT INTO "title" VALUES(46307243,46307898,'é é¼ ðª¨'); > > in sqlite text type, no problem.. > in postgres report error: > > invalid byte sequence for encoding "UNICODE": 0xf0 Your INSERT statement works for me in 8.3.3, 8.2.9, and 8.1.13. According to the release notes version 8.1 changed UNICODE to UTF8 and added support for 4-byte characters, so the fact that the error says "UNICODE" and your database doesn't appear to support 4-byte characters makes me wonder if you're running 8.0 or earlier. -- Michael Fuhr
very sorry, is i wrong.. the version is 8.0.15. i just copyed from wrong of server-terminal window.. -_-! thank you for help. arli Michael Fuhr wrote: > On Sat, Jun 21, 2008 at 01:25:15PM +0000, arli weng wrote: >=20=20=20 >> PostgreSQL version: 8.3 >>=20=20=20=20=20 > > What does "SELECT version()" return? I'm wondering if the server > isn't 8.3 but rather an earlier version (see below). > >=20=20=20 >> the command (chinese by utf-8): >> INSERT INTO "title" VALUES(46307243,46307898,'=E9=85=8B=E9=BC=A0=F0=AA= =95=A8'); >> >> in sqlite text type, no problem.. >> in postgres report error: >> >> invalid byte sequence for encoding "UNICODE": 0xf0 >>=20=20=20=20=20 > > Your INSERT statement works for me in 8.3.3, 8.2.9, and 8.1.13. > According to the release notes version 8.1 changed UNICODE to UTF8 > and added support for 4-byte characters, so the fact that the error > says "UNICODE" and your database doesn't appear to support 4-byte > characters makes me wonder if you're running 8.0 or earlier. > >=20=20=20