Thread: Ora2Pg and export of Multbyte UTF8 characters
Hello, I have gotten Ora2pg to work fairly well for me, but I am having an issue where mutibyte characters are being substituted with replacement characters on export from Oracle. I used both the output to a file and the direct import to a PG database and see a bunch of question marks instead of my multibyte chars. I have pulled text directly from each database using the exact same DBD::Oracle and DBD::Pg modules and the Oracle set displays the multibytes appropriately, so I believe I have elminated that module as a potential problem. The oracle characters from the original table look fine when I select from the table. An example (which may or may not come through correctly for everyone) is as follows: <p>16を基数とした数。16進数では、0~9までの桁数字を通常どおり<p> The postgres characters from the destination table look like: <p>16????????16?????0?9????????????????????????<p> Any hints as to what I need to do to get them corrected. When I output to a flat file it appears as though the question marks are in the flat file so something on export seems to be getting corrupted. Thanks, Chaun __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Is your database and client encoding set to UTF-8? That's about all I can think of. Chaun Keating wrote: > Hello, I have gotten Ora2pg to work fairly well for > me, but I am having an issue where mutibyte characters > are being substituted with replacement characters on > export from Oracle. > > I used both the output to a file and the direct import > to a PG database and see a bunch of question marks > instead of my multibyte chars. > > I have pulled text directly from each database using > the exact same DBD::Oracle and DBD::Pg modules and the > Oracle set displays the multibytes appropriately, so I > believe I have elminated that module as a potential > problem. > > The oracle characters from the original table look > fine when I select from the table. An example (which > may or may not come through correctly for everyone) is > as follows: > <p>16を基数とした数。16進数では、0~9までの桁数字を通常どおり<p> > > The postgres characters from the destination table > look like: > <p>16????????16?????0?9????????????????????????<p> > > Any hints as to what I need to do to get them > corrected. When I output to a flat file it appears as > though the question marks are in the flat file so > something on export seems to be getting corrupted. > > Thanks, > Chaun > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > > !DSPAM:14,44b7bf7434536342414476! > > >
Ah, I've actually run into this problem before. Seems I'd forgotten to explicitly set the NLS_LANG parameter in the Ora2Pg script. Once I set it to AMERICAN_AMERICA.UTF8 it seems to work just dandy. --- Andy Shellam <andy@andycc.net> wrote: > Is your database and client encoding set to UTF-8? > > That's about all I can think of. > > > > Chaun Keating wrote: > > Hello, I have gotten Ora2pg to work fairly well > for > > me, but I am having an issue where mutibyte > characters > > are being substituted with replacement characters > on > > export from Oracle. > > > > I used both the output to a file and the direct > import > > to a PG database and see a bunch of question marks > > instead of my multibyte chars. > > > > I have pulled text directly from each database > using > > the exact same DBD::Oracle and DBD::Pg modules and > the > > Oracle set displays the multibytes appropriately, > so I > > believe I have elminated that module as a > potential > > problem. > > > > The oracle characters from the original table look > > fine when I select from the table. An example > (which > > may or may not come through correctly for > everyone) is > > as follows: > > > <p>16ãåºæ°ã¨ããæ°ã16鲿°ã§ã¯ã0ï½9ã¾ã§ã®æ¡æ°åãé常ã©ãã<p> > > > > The postgres characters from the destination table > > look like: > > <p>16????????16?????0?9????????????????????????<p> > > > > Any hints as to what I need to do to get them > > corrected. When I output to a flat file it appears > as > > though the question marks are in the flat file so > > something on export seems to be getting corrupted. > > > > Thanks, > > Chaun > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > > > ---------------------------(end of > broadcast)--------------------------- > > TIP 6: explain analyze is your friend > > > > !DSPAM:14,44b7bf7434536342414476! > > > > > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will > ignore your desire to > choose an index scan if your joining column's > datatypes do not > match > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
I had my database set to SQL_ASCII and switched to UTF8, but now I notice that I must add a slash for periods/dots ( \. vs . ) to insert into varchar. Is this normal? Thanks, J
PostgreSQL Admin wrote: > I had my database set to SQL_ASCII and switched to UTF8, but now I > notice that I must add a slash for periods/dots ( \. vs . ) to insert > into varchar. Is this normal? No. -- Peter Eisentraut http://developer.postgresql.org/~petere/
I had the same problem when upgrading v7.4 database to v8, SQL-ASCII to UTF8 - we had to replace single backslashes with double backslashes in the v7 database to get the data to display correctly, then dump and restore in v8 / UTF8. Have we done something wrong please?
On 7/22/06, Peter Eisentraut <peter_e@gmx.net> wrote:
PostgreSQL Admin wrote:
> I had my database set to SQL_ASCII and switched to UTF8, but now I
> notice that I must add a slash for periods/dots ( \. vs . ) to insert
> into varchar. Is this normal?
No.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
Well the server does have the capability to convert from the server encoding to the client encoding and the other way around. To get this working you need a server encoding which can be convertet to the clientencoding. UNICODE (UTF8) is a good starting point here. Now the server needs to know waht the encoding of the client is. You can set the client encoding from the client by sending "set client_encoding to <whatever it is>" to the server after the connection is started. There are other ways which can solve other problems (see character set support in the manual), for my application this way wroked best and makes it simple to work with different clientencodings. Best regards Ivo Am Dienstag, 25. Juli 2006 12.27 schrieb adey: > I had the same problem when upgrading v7.4 database to v8, SQL-ASCII to > UTF8 - we had to replace single backslashes with double backslashes in the > v7 database to get the data to display correctly, then dump and restore in > v8 / UTF8. Have we done something wrong please? > > On 7/22/06, Peter Eisentraut <peter_e@gmx.net> wrote: > > PostgreSQL Admin wrote: > > > I had my database set to SQL_ASCII and switched to UTF8, but now I > > > notice that I must add a slash for periods/dots ( \. vs . ) to insert > > > into varchar. Is this normal? > > > > No. > > > > -- > > Peter Eisentraut > > http://developer.postgresql.org/~petere/ > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org