Thread: Ora2Pg and export of Multbyte UTF8 characters

Ora2Pg and export of Multbyte UTF8 characters

From

Chaun Keating

Date:

14 July 2006, 12:58:24

Hello, I have gotten Ora2pg to work fairly well for
me, but I am having an issue where mutibyte characters
are being substituted with replacement characters on
export from Oracle.

I used both the output to a file and the direct import
to a PG database and see a bunch of question marks
instead of my multibyte chars.

I have pulled text directly from each database using
the exact same DBD::Oracle and DBD::Pg modules and the
Oracle set displays the multibytes appropriately, so I
believe I have elminated that module as a potential
problem.

The oracle characters from the original table look
fine when I select from the table.  An example (which
may or may not come through correctly for everyone) is
as follows:

<p>16を基数とした数。16進数では、0～9までの桁数字を通常どおり<p>

The postgres characters from the destination table
look like:
<p>16????????16?????0?9????????????????????????<p>

Any hints as to what I need to do to get them
corrected. When I output to a flat file it appears as
though the question marks are in the flat file so
something on export seems to be getting corrupted.

Thanks,
Chaun

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Ora2Pg and export of Multbyte UTF8 characters

From

Andy Shellam

Date:

14 July 2006, 13:07:21

Is your database and client encoding set to UTF-8?

That's about all I can think of.



Chaun Keating wrote:
> Hello, I have gotten Ora2pg to work fairly well for
> me, but I am having an issue where mutibyte characters
> are being substituted with replacement characters on
> export from Oracle.
>
> I used both the output to a file and the direct import
> to a PG database and see a bunch of question marks
> instead of my multibyte chars.
>
> I have pulled text directly from each database using
> the exact same DBD::Oracle and DBD::Pg modules and the
> Oracle set displays the multibytes appropriately, so I
> believe I have elminated that module as a potential
> problem.
>
> The oracle characters from the original table look
> fine when I select from the table.  An example (which
> may or may not come through correctly for everyone) is
> as follows:
>
<p>16を基数とした数。16進数では、0～9までの桁数字を通常どおり<p>
>
> The postgres characters from the destination table
> look like:
> <p>16????????16?????0?9????????????????????????<p>
>
> Any hints as to what I need to do to get them
> corrected. When I output to a flat file it appears as
> though the question marks are in the flat file so
> something on export seems to be getting corrupted.
>
> Thanks,
> Chaun
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
> !DSPAM:14,44b7bf7434536342414476!
>
>
>

Re: Ora2Pg and export of Multbyte UTF8 characters

From

Chaun Keating

Date:

14 July 2006, 15:35:53

Ah, I've actually run into this problem before.  Seems
I'd forgotten to explicitly set the NLS_LANG parameter
in the Ora2Pg script.  Once I set it to
AMERICAN_AMERICA.UTF8 it seems to work just dandy.

--- Andy Shellam <andy@andycc.net> wrote:

> Is your database and client encoding set to UTF-8?
>
> That's about all I can think of.
>
>
>
> Chaun Keating wrote:
> > Hello, I have gotten Ora2pg to work fairly well
> for
> > me, but I am having an issue where mutibyte
> characters
> > are being substituted with replacement characters
> on
> > export from Oracle.
> >
> > I used both the output to a file and the direct
> import
> > to a PG database and see a bunch of question marks
> > instead of my multibyte chars.
> >
> > I have pulled text directly from each database
> using
> > the exact same DBD::Oracle and DBD::Pg modules and
> the
> > Oracle set displays the multibytes appropriately,
> so I
> > believe I have elminated that module as a
> potential
> > problem.
> >
> > The oracle characters from the original table look
> > fine when I select from the table.  An example
> (which
> > may or may not come through correctly for
> everyone) is
> > as follows:
> >
>
<p>16ãåºæ°ã¨ããæ°ã16é²æ°ã§ã¯ã0ï½9ã¾ã§ã®æ¡æ°åãéå¸¸ã©ãã<p>
> >
> > The postgres characters from the destination table
> > look like:
> > <p>16????????16?????0?9????????????????????????<p>
> >
> > Any hints as to what I need to do to get them
> > corrected. When I output to a flat file it appears
> as
> > though the question marks are in the flat file so
> > something on export seems to be getting corrupted.
> >
> > Thanks,
> > Chaun
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 6: explain analyze is your friend
> >
> > !DSPAM:14,44b7bf7434536342414476!
> >
> >
> >
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will
> ignore your desire to
>        choose an index scan if your joining column's
> datatypes do not
>        match
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

UTF8 characters

From

PostgreSQL Admin

Date:

21 July 2006, 13:33:03

I had my database set to SQL_ASCII and switched to UTF8, but now I
notice that I must add a slash for periods/dots ( \. vs . ) to insert
into varchar.   Is this normal?

Thanks,
J

Re: UTF8 characters

From

Peter Eisentraut

Date:

21 July 2006, 13:42:27

PostgreSQL Admin wrote:
> I had my database set to SQL_ASCII and switched to UTF8, but now I
> notice that I must add a slash for periods/dots ( \. vs . ) to insert
> into varchar.   Is this normal?

No.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: UTF8 characters

From

adey

Date:

25 July 2006, 08:00:37

I had the same problem when upgrading v7.4 database to v8, SQL-ASCII to UTF8 - we had to replace single backslashes with double backslashes in the v7 database to get the data to display correctly, then dump and restore in v8 / UTF8. Have we done something wrong please?

On 7/22/06, Peter Eisentraut <peter_e@gmx.net> wrote:

PostgreSQL Admin wrote:
> I had my database set to SQL_ASCII and switched to UTF8, but now I
> notice that I must add a slash for periods/dots ( \. vs . ) to insert
> into varchar. Is this normal?

No.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Re: UTF8 characters

From

Ivo Rossacher

Date:

25 July 2006, 14:39:56

Well the server does have the capability to convert from the server encoding
to the client encoding and the other way around. To get this working you need
a server encoding which can be convertet to the clientencoding. UNICODE
(UTF8) is a good starting point here.
Now the server needs to know waht the encoding of the client is.
You can set the client encoding from the client by sending "set
client_encoding to <whatever it is>" to the server after the connection is
started.
There are other ways which can solve other problems (see character set support
in the manual), for my application this way wroked best and makes it simple
to work with different clientencodings.

Best regards
Ivo

Am Dienstag, 25. Juli 2006 12.27 schrieb adey:
> I had the same problem when upgrading v7.4 database to v8, SQL-ASCII to
> UTF8 - we had to replace single backslashes with double backslashes in the
> v7 database to get the data to display correctly, then dump and restore in
> v8 / UTF8. Have we done something wrong please?
>
> On 7/22/06, Peter Eisentraut <peter_e@gmx.net> wrote:
> > PostgreSQL Admin wrote:
> > > I had my database set to SQL_ASCII and switched to UTF8, but now I
> > > notice that I must add a slash for periods/dots ( \. vs . ) to insert
> > > into varchar.   Is this normal?
> >
> > No.
> >
> > --
> > Peter Eisentraut
> > http://developer.postgresql.org/~petere/
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> >               http://archives.postgresql.org