Thread: \COPY ... CSV with hex escapes
Hi folks I have an odd csv input format to deal with. I'm about to put some Python together to reprocess it, but I thought I'd check in and see if I'm missing something obvious in \copy's capabilities. The input is fairly conventional comma-delimeted text with quoted fields, as output by Sybase SQL Anywhere 5.5's isql 'OUTPUT TO' filter. Yes, that's ancient. It is handled quite happily by \copy in csv mode, except that when csv mode is active, \xnn escapes do not seem to be processed. So I can have *either* \xnn escape processing *or* csv-style input processing. Anyone know of a way to get escape processing in csv mode? -- Craig Ringer
On Wed, Jul 7, 2010 at 9:21 PM, Craig Ringer <craig@postnewspapers.com.au> wrote: > Hi folks > > I have an odd csv input format to deal with. I'm about to put some > Python together to reprocess it, but I thought I'd check in and see if > I'm missing something obvious in \copy's capabilities. > > The input is fairly conventional comma-delimeted text with quoted > fields, as output by Sybase SQL Anywhere 5.5's isql 'OUTPUT TO' filter. > Yes, that's ancient. It is handled quite happily by \copy in csv mode, > except that when csv mode is active, \xnn escapes do not seem to be > processed. So I can have *either* \xnn escape processing *or* csv-style > input processing. > > Anyone know of a way to get escape processing in csv mode? > Don't know if you can do it directly, but this seem like one of those cases where a ETL tool like that from Pentaho (Kettle / Spoon) might be in order? One step to handle the escape chars and one to load the actual CSV... -- Peter Hunsberger
On 8 Jul 2010, at 4:21, Craig Ringer wrote: > Yes, that's ancient. It is handled quite happily by \copy in csv mode, > except that when csv mode is active, \xnn escapes do not seem to be > processed. So I can have *either* \xnn escape processing *or* csv-style > input processing. > > Anyone know of a way to get escape processing in csv mode? And what do those hex-escaped bytes mean? Are they in text strings? AFAIK CSV doesn't contain any information about whatencoding was used to create it, so it could be about anything; UTF-8, Win1252, ISO-8859-something, or whatever Sybasewas using. I'm just saying, be careful what you're parsing there ;) Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling. !DSPAM:737,4c359d9f286212106016419!
On 08/07/10 17:42, Alban Hertroys wrote: > On 8 Jul 2010, at 4:21, Craig Ringer wrote: > >> Yes, that's ancient. It is handled quite happily by \copy in csv mode, >> except that when csv mode is active, \xnn escapes do not seem to be >> processed. So I can have *either* \xnn escape processing *or* csv-style >> input processing. >> >> Anyone know of a way to get escape processing in csv mode? > > > And what do those hex-escaped bytes mean? Are they in text strings? AFAIK CSV doesn't contain any information about whatencoding was used to create it, so it could be about anything; UTF-8, Win1252, ISO-8859-something, or whatever Sybasewas using. > > I'm just saying, be careful what you're parsing there ;) Thanks for that. In this case, the escapes are just "bytes" - what's important is that, after unescaping, the CSV data is interpreted as latin-1. OK, Windows-1252, but close enough. In the end Python's csv module did the trick. I just pulled in the CSV data, and spat out Postgresql-friendly COPY format so that I didn't need to use the COPY ... CSV modifier and Pg would interpret the escapes during input. In case anyone else needs to deal with this format, here's the program I used. -- Craig Ringer Tech-related writing: http://soapyfrogs.blogspot.com/