Thread: COPY-ing ASCII file into UTF-8 database
THE ISSUE: I need to import a text file (ASCII) dumped from MySQL 3.23 database into Postgre table (UTF-8). The file contains "specifically Estonian characters" Õ, Ä, Ö, Ü. My attempt to COPY this file directly failed. Postgre said: ERROR: invalid byte sequence for encoding "UTF8": 0xf5706509 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". OK... The file isn't big, and there are just 4 characters causing problems, so I substitued each of them with a distinctive pattern of ASCII characters and successfully COPY-ied the file into Postgre table. With PgAdmin I've changed 1 record back to Estonian – worked fine. I then exported this table into a file, opened the file in UltraEdit (the status bar showed it's a UTF-8 indeed) and changed a few temporary substitutions with "specifically Estonian characters". Then I cleared the table and tried to COPY data from my now UTF-8 file again. I've got an error: ERROR: invalid input syntax for integer: "2" CONTEXT: COPY school, line 1, column sch_id: "2" PgAdmin actually displayed it with something like chopped hyphen, which I've failed to copy into this message and which wasn't visible in the text editor even with "Show hidden characters". MY QUESTIONS: 1. What is the proper solution for importing data from plain ASCII files, that contain the type of characters mentioned above? 2. I'd like to fill the gaps (huge, obviously) in my knowlege about handling different charsets. Can you point me to some sane resource(s)? I'm sure, there's a lot of them. Sorry for slipping off the topic here. Regards, Toomas
Toomas Vendelin wrote: > > MY QUESTIONS: > > 1. What is the proper solution for importing data from plain ASCII > files, that contain the type of characters mentioned above? Set the correct client_encoding, before importing. Presumably the Estonian characters are part of one of the flavours of LATIN 1-9 > SET CLIENT_ENCODING='LATIN<X>'; -- Where you substitute <X> for the correct number > \copy .... -- Tommy Gildseth
2009/4/6 Toomas Vendelin <pg@vendelin.com>: > THE ISSUE: > I need to import a text file (ASCII) dumped from MySQL 3.23 database into > Postgre table (UTF-8). The file contains "specifically Estonian characters" > Õ, Ä, Ö, Ü. My attempt to COPY this file directly failed. Postgre said: > > ERROR: invalid byte sequence for encoding "UTF8": 0xf5706509 > HINT: This error can also happen if the byte sequence does not match the > encoding expected by the server, which is controlled by "client_encoding". > > OK... The file isn't big, and there are just 4 characters causing problems, > so I substitued each of them with a distinctive pattern of ASCII characters > and successfully COPY-ied the file into Postgre table. With PgAdmin I've > changed 1 record back to Estonian – worked fine. I then exported this table > into a file, opened the file in UltraEdit (the status bar showed it's a > UTF-8 indeed) and changed a few temporary substitutions with "specifically > Estonian characters". Then I cleared the table and tried to COPY data from > my now UTF-8 file again. I've got an error: > > ERROR: invalid input syntax for integer: "2" > CONTEXT: COPY school, line 1, column sch_id: "2" > > PgAdmin actually displayed it with something like chopped hyphen, which I've > failed to copy into this message and which wasn't visible in the text editor > even with "Show hidden characters". > > MY QUESTIONS: > > 1. What is the proper solution for importing data from plain ASCII files, > that contain the type of characters mentioned above? > > 2. I'd like to fill the gaps (huge, obviously) in my knowlege about handling > different charsets. Can you point me to some sane resource(s)? I'm sure, > there's a lot of them. Sorry for slipping off the topic here. > > Regards, > > Toomas > > In adition of set the local encoding (Client and Server), if you are runnning +nix there is a command called iconv that let you convert from one charset to another. Always i used it, i never had any problems. -- Emanuel Calvo Franco Sumate al ARPUG ! (www.postgres-arg.org - www.arpug.com.ar) ArPUG / AOSUG Member Postgresql Support & Admin