Thread: Encoding errors when upgrading from 7.4 to 8.1

Encoding errors when upgrading from 7.4 to 8.1

From
Adam Witney
Date:
Hi,

I am upgrading from 7.4.8 -> 8.1.2 on Linux 2.6.14.3 #1 SMP

I have installed 8.1.2 and created the database (with encoding 'UNICODE', as
I had done in 7.4.8) and am trying to load a 7.4.8 dump file but I am
getting a few errors like this:

psql:bugasbase2-backup:45880: ERROR:  invalid UTF-8 byte sequence detected
near byte 0xb5
CONTEXT:  COPY array_scheme, line 17560, column gene_identifier: "B?G@S
(0G11)"

This dump file will load error free into 7.4.8.

Does anybody have any ideas why this is failing in 8.1.2?

Thanks for any help

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: Encoding errors when upgrading from 7.4 to 8.1

From
Seneca Cunningham
Date:
Adam Witney wrote:
> psql:bugasbase2-backup:45880: ERROR:  invalid UTF-8 byte sequence detected
> near byte 0xb5
> CONTEXT:  COPY array_scheme, line 17560, column gene_identifier: "B?G@S
> (0G11)"
>
> This dump file will load error free into 7.4.8.
>
> Does anybody have any ideas why this is failing in 8.1.2?

8.1 changed UTF-8 handling to be more strict about invalid sequences.
You may want to read the 8.1.0 release notes.

<http://www.postgresql.org/docs/8.1/interactive/release-8-1.html>

--
Seneca Cunningham
scunning@ca.afilias.info

Re: Encoding errors when upgrading from 7.4 to 8.1

From
Martijn van Oosterhout
Date:
On Thu, Jan 26, 2006 at 06:07:45PM +0000, Adam Witney wrote:
> Hi,
>
> I am upgrading from 7.4.8 -> 8.1.2 on Linux 2.6.14.3 #1 SMP
>
> I have installed 8.1.2 and created the database (with encoding 'UNICODE', as
> I had done in 7.4.8) and am trying to load a 7.4.8 dump file but I am
> getting a few errors like this:
>
> psql:bugasbase2-backup:45880: ERROR:  invalid UTF-8 byte sequence detected
> near byte 0xb5
> CONTEXT:  COPY array_scheme, line 17560, column gene_identifier: "B?G@S
> (0G11)"

There were some changes in the checking. PostgreSQL used to allow
invalid utf-8 sequences in that it no longer accepts. You basically
need to clean up the data. Something like what's suggested here:

http://archives.postgresql.org/pgsql-hackers/2005-12/msg00511.php

may be helpful. This whole thread is useful actually...
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment