Thread: invalid byte sequence for encoding "UTF8"
Hi, I am trying currently trying to setup our new database sever, we have upgraded to PostgreSQL 8.1.8. When I try to restore the backup (which is stored as a set of SQL statements that my restore script feeds into PSQL to execute) it returns the following error. psql:/mnt/tmp/app/application_data.sql:97425: ERROR: invalid byte sequence for encoding "UTF8": 0xff HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". along other byte sequences eg: 0xa1, 0xac, the two remaining schemas are roughly 22GB and 66GB in size and is read into postgres from flat cobol datafiles. our data has progressed as displayed below PostgreSQL 7.?.? Stored in SQL-ASCII (Old configuration) PostgreSQL 8.1.3 Stored in UTF8 (current conguration) PostgreSQL 8.1.8 Stored in UTF8 (our future configuration) The encoding type set on the server was changed to UTF8 from SQL-ASCII after we moved to version 8.1.3 for purposes of globalisation. I've searched the forums and found people with similar problems but not much on a way to remedy it. I did try using iconv which was suggested in a thread but it returned an error saying even the 22GB file was too large to work on. any help would be gratfully appreciated. Many Thanks David P
On Wednesday 21 March 2007 04:17, "Fuzzygoth" <dav.phillips@ntlworld.com> wrote: > I've searched the forums and found people with similar problems but > not much > on a way to remedy it. I did try using iconv which was suggested in a > thread > but it returned an error saying even the 22GB file was too large to > work on. iconv needs to read the whole file into RAM. What you can do is use the UNIX split utility to split the dump file into smaller segments, use iconv on each segment, and then cat all the converted segments back together into a new dump file. iconv is I think your best option for converting the dump to a valid encoding. -- "None are more hopelessly enslaved than those who falsely believe they are free." -- Johann W. Von Goethe
On Wed, Mar 21, 2007 at 09:54:41AM -0700, Alan Hodgson wrote: > iconv needs to read the whole file into RAM. What you can do is use the > UNIX split utility to split the dump file into smaller segments, use iconv > on each segment, and then cat all the converted segments back together into > a new dump file. iconv is I think your best option for converting the dump > to a valid encoding. The guys at openstreetmap have written a UTF-8 cleaner that doesn't read the whole file into memory: http://trac.openstreetmap.org/browser/utils/planet.osm/C Definitly more convenient for large files. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.