Thread: Importing binary data
Hey guys, I was given a database back of a non-PostgreSQL database. That database contains records where some binary file (looks likeemail attachments) was imported into several chunks of X characters in length and then stored into multiple records.A messy way of storing BLOB data. The database encoding is LATIN1, ISO8859-1. There chunks are actually 50 fields of 60 bytes each per row. If the original file is larger than that, more than one rowis used. I can export the data out of that database into flat files just fine, but then I try to import the data to Postgres, I'mgetting errors like this: ERROR: invalid byte sequence for encoding "SQL_ASCII": 0x00 CONTEXT: COPY attachments, line 14: "58025 1 cl\Cert.r 10 M04P'15A415).($-H87)4:6UE+$-(05)!0U1%4BQ)3E!55"!I5&EM92!)3E1% M1T52'$585$523B!7..." I tried LATIN1, SQL_ASCII, UTF-8, nothing works. I even tried to make the data type 'bytea', no luck. I'd love to have a"NO-CONVERSION" option on the copy command that just takes what ever bytes come along and doesn't try to interpret them. Any ideas of what I can do to import this stuff? best regards, chris -- chris ruprecht database grunt and bit pusher extraordinaíre
Chris Ruprecht-3 wrote > Hey guys, > > I was given a database back of a non-PostgreSQL database. That database > contains records where some binary file (looks like email attachments) was > imported into several chunks of X characters in length and then stored > into multiple records. A messy way of storing BLOB data. The database > encoding is LATIN1, ISO8859-1. > There chunks are actually 50 fields of 60 bytes each per row. If the > original file is larger than that, more than one row is used. > > I can export the data out of that database into flat files just fine, but > then I try to import the data to Postgres, I'm getting errors like this: > > ERROR: invalid byte sequence for encoding "SQL_ASCII": 0x00 > CONTEXT: COPY attachments, line 14: "58025 1 cl\Cert.r 10 > M04P'15A415).($-H87)4:6UE+$-(05)!0U1%4BQ)3E!55"!I5&EM92!)3E1% > M1T52'$585$523B!7..." > > I tried LATIN1, SQL_ASCII, UTF-8, nothing works. I even tried to make the > data type 'bytea', no luck. I'd love to have a "NO-CONVERSION" option on > the copy command that just takes what ever bytes come along and doesn't > try to interpret them. > > Any ideas of what I can do to import this stuff? PostgreSQL does not like NUL (0x00) in the data that it is importing - I don't believe it matters what encoding you are using. That said it would help to clarify exactly how you are running the copy command - specifically client or server. I haven't ever personally encountered this situation but any chance you can base64 encode on output and then import that way into the new database on a staging table then decode and manipulate the staged data before storing it into the permanent table? David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Importing-binary-data-tp5824488p5824490.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
I'm using psql mydb # copy attachments FROM '/opt/tmp/clientdata/sql-data-p/attachments.dat' with NULL as 'NULL'; to load the database back in. The psql command runs on the same box as the DB server. If I had my way, I'd store the binary somewhere and keep a reference to it in the database. Having binary data (images, PDFs,etc.) in the database never made much sense to me. But this is a client's DB. They used to have an application writtenin something called "Progress 4GL" and now ported all that to use PostgreSQL with what ever front end (I have no idea).I believe they are using the new app just like the old app, so unfortunately, I can't just go and change the underlyingdatabase structure. Hope this helps to clarify the issue. > On Oct 27, 2014, at 15:57 , David G Johnston <david.g.johnston@gmail.com> wrote: > > PostgreSQL does not like NUL (0x00) in the data that it is importing - I > don't believe it matters what encoding you are using. That said it would > help to clarify exactly how you are running the copy command - specifically > client or server. > > I haven't ever personally encountered this situation but any chance you can > base64 encode on output and then import that way into the new database on a > staging table then decode and manipulate the staged data before storing it > into the permanent table? > > David J. > > > > > -- > View this message in context: http://postgresql.1045698.n5.nabble.com/Importing-binary-data-tp5824488p5824490.html > Sent from the PostgreSQL - general mailing list archive at Nabble.com. > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
Chris Ruprecht <chris@cdrbill.com> writes: > I can export the data out of that database into flat files just fine, but then I try to import the data to Postgres, I'mgetting errors like this: > ERROR: invalid byte sequence for encoding "SQL_ASCII": 0x00 > CONTEXT: COPY attachments, line 14: "58025 1 cl\Cert.r 10 M04P'15A415).($-H87)4:6UE+$-(05)!0U1%4BQ)3E!55"!I5&EM92!)3E1% M1T52'$585$523B!7..." > I tried LATIN1, SQL_ASCII, UTF-8, nothing works. I even tried to make the data type 'bytea', no luck. I'd love to havea "NO-CONVERSION" option on the copy command that just takes what ever bytes come along and doesn't try to interpretthem. That's hardly possible, considering you're expecting COPY to recognize field and record boundaries. What you probably need to do here is declare the column as bytea and then write some sort of preprocessing script that converts the binary data into hex-encoded form (basically \x followed by hex digits, if memory serves, but check the description of bytea in the PG manual). regards, tom lane
Thanks guys. I know what I will spend the night, doing ;). > On Oct 27, 2014, at 16:33 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Chris Ruprecht <chris@cdrbill.com> writes: >> I can export the data out of that database into flat files just fine, but then I try to import the data to Postgres, I'mgetting errors like this: > >> ERROR: invalid byte sequence for encoding "SQL_ASCII": 0x00 >> CONTEXT: COPY attachments, line 14: "58025 1 cl\Cert.r 10 M04P'15A415).($-H87)4:6UE+$-(05)!0U1%4BQ)3E!55"!I5&EM92!)3E1% M1T52'$585$523B!7..." > >> I tried LATIN1, SQL_ASCII, UTF-8, nothing works. I even tried to make the data type 'bytea', no luck. I'd love to havea "NO-CONVERSION" option on the copy command that just takes what ever bytes come along and doesn't try to interpretthem. > > That's hardly possible, considering you're expecting COPY to recognize > field and record boundaries. What you probably need to do here is declare > the column as bytea and then write some sort of preprocessing script that > converts the binary data into hex-encoded form (basically \x followed by > hex digits, if memory serves, but check the description of bytea in the > PG manual). > > regards, tom lane