Thread: UTF-8, upper() and Chinese characters yielding blank result
While I could see various multibyte issues in the archives and in the TODO list, I couldn't spot this exact issue: I am working with a database that uses UNICODE encoding. I have a varchar column (col_x) that includes a mix of Chinese and regular ASCII characters. On PostgreSQL 7.4.13 (on RHEL4) "select col_x, upper(col_x) from my_table" performs the desired upper() conversion - i.e. the ASCII characters are converted to upper case and the Chinese characters are left as is. The problem appears on PostgreSQL 8.0.7 (on WinXP) where the upper() result is apparently blank (this is via pgAdmin III). Worde still, via JDBC I am getting: java.sql.SQLException: Invalid character data was found. This is most likely caused by stored data containing characters that are invalid for the character set the database was created in. The most common example of this is storing 8bit data in a SQL_ASCII database. Is this a bug or a change of behaviour between versions? Is there some way I can get the 7.4.13 behaviour in 8.0.7? TIA, Scott
Scott Eade wrote: > The problem appears on PostgreSQL 8.0.7 (on WinXP) PostgreSQL 8.0 on Windows does not support UTF-8. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Thu, Jul 27, 2006 at 07:22:17PM +0200, Peter Eisentraut wrote: > Scott Eade wrote: > > The problem appears on PostgreSQL 8.0.7 (on WinXP) > > PostgreSQL 8.0 on Windows does not support UTF-8. In addition, PostgreSQL is totally reliant on the OS for upper/lower/collation support, so there is no way you can expect to get similar result across different OSes, or even different versions of the same OS. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.