Charset encoding and accents - Mailing list pgsql-hackers
From | Davide Romanini |
---|---|
Subject | Charset encoding and accents |
Date | |
Msg-id | 3E9533A5.8070805@libero.it Whole thread Raw |
Responses |
Re: Charset encoding and accents
|
List | pgsql-hackers |
Hi, I've posted this problem two times in the pgsql-jdbc user list, but no one helped me to solve it. I think this is a really serious problem in the jdbc driver. I've tried different solutions with no result. Well, let me explain the problem. I've a currently working database in PostgreSQL. There's an application, written in M$ Access, that uses the database through the ODBC driver with no problems. I'd want to access the data using a Swing application through the jdbc driver. At server side the charset encoding is set as SQL_ASCII. It is not a problem because all the strings containing accented characters are retrived correctly by ODBC and also the psql client. But if I retrive strings containing accents (like àòù) using jdbc I get in trouble because my accents get dirty. For example: the string 'La città di Forlì' is retrived and displayed as 'La citt?di Forl?'! I've worked a bit around the problem with the source code of the driver. I notice that when I call rs.getString(), the driver invokes (at a certain point) the method org.postgresql.core.Encoding.decode(byte[] encodedString, int offset, int length). This method calls the decodeUTF8 when the actual encoding equals to "UTF-8". If the encoding is different, it simply returns a new String(encodedString, offset, length, encoding). Well, my database is SQL_ASCII, so the jdbc driver should return a new string and not call decodeUTF8. But when I do a step by step debug into the source, the encoding ALWAYS equals to UTF-8! I've also tried to set a parameter in my connection string: jdbc:postgresql://localhost/prova?charSet=SQL_ASCII (I've tried a lot of different encodings here). The encoding is always UTF-8. Well, I thought 'if the driver wants strings to be UNICODE, set up the server variable CLIENT_ENCODING to UNICODE'. No result! It doesn't change! The only way to have my string displayed correctly is to comment out all the decodeUTF8 and take it return a new String(data). So I think that if the encoding is correctly recognized to be different from UTF-8 the decode method will return the new String that is the correct behaviour in my case. Please don't answer me to change my database to UNICODE. I cannot do that. And I do not WANT to do that. Why the ODBC driver works fine and the JDBC driver works only with UNICODE databases?? It's a bug and should be corrected. If I was skilled enough I corrected the bug myself but I don't know much about JDBC standard. I hope you answer to me with a solution. Really, the driver is simply unusable for serious work with this bug. The problem is not solved with the latest stable (version 7.3 build 109) and development (version 7.4 build 204) release of the driver. Regards, Romaz -- Davide Romanini
pgsql-hackers by date: