Thread: COPY FROM is not 8bit clean
ACK!!!!! must rember which MTA I'm useing... When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the delimiter and ends up with parse errors when trying to do the insert What the ?? why dind' tthat go through with the body of the text.. *sigh* I'll resend in the AM..
> When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the > delimiter and ends up with parse errors when trying to do the insert > > > What the ?? why dind' tthat go through with the body of the text.. *sigh* > I'll resend in the AM.. Good catch. It's definitely a bug in copy command. Please try following patches (this is against 7.2). *** src/backend/commands/copy.c.orig Tue Feb 26 21:11:05 2002 --- src/backend/commands/copy.c Tue Feb 26 21:11:35 2002 *************** *** 1024,1030 **** CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline, char *null_print) { int c; ! int delimc = delim[0]; #ifdef MULTIBYTE int mblen; --- 1024,1030 ---- CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline, char *null_print) { int c; ! int delimc = (unsigned char)delim[0]; #ifdef MULTIBYTE int mblen;
Postgres was not compiled with Multibyte, if I replace the if (delimc == c) with if (strstr(delim,c)) it works as expected. This changes was implemented for performance reasons according to the CVS log. At 11:57 PM 2/25/02 -0500, Tom Lane wrote: >Darcy Buskermolen <darcy@ok-connect.com> writes: >> When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the >> delimiter and ends up with parse errors when trying to do the insert > >Are you perhaps operating in a multibyte encoding in which \254 is >just the first byte of a multibyte character? > >I'm not sure what we do in such a case, and even less sure what we >should do ... but I am entirely prepared to believe that we don't >do the Right Thing ... > > regards, tom lane > >
This patch solves the problem. At 09:16 PM 2/26/02 +0900, Tatsuo Ishii wrote: >> When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the >> delimiter and ends up with parse errors when trying to do the insert >> >> >> What the ?? why dind' tthat go through with the body of the text.. *sigh* >> I'll resend in the AM.. > >Good catch. It's definitely a bug in copy command. Please try >following patches (this is against 7.2). > >*** src/backend/commands/copy.c.orig Tue Feb 26 21:11:05 2002 >--- src/backend/commands/copy.c Tue Feb 26 21:11:35 2002 >*************** >*** 1024,1030 **** > CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline, char *null_print) > { > int c; >! int delimc = delim[0]; > > #ifdef MULTIBYTE > int mblen; >--- 1024,1030 ---- > CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline, char *null_print) > { > int c; >! int delimc = (unsigned char)delim[0]; > > #ifdef MULTIBYTE > int mblen; > >
Darcy Buskermolen <darcy@ok-connect.com> writes: > Postgres was not compiled with Multibyte, if I replace the if (delimc == c) > with if (strstr(delim,c)) it works as expected. This changes was > implemented for performance reasons according to the CVS log. Yeah, my error :-(. See Tatsuo's reply for the correct fix. regards, tom lane
Can someone explain why this fixes the problem. I thought it was safe to assign a char to an int and do a compare. The compare I see is: if (c == delimc) break; --------------------------------------------------------------------------- Darcy Buskermolen wrote: > This patch solves the problem. > > At 09:16 PM 2/26/02 +0900, Tatsuo Ishii wrote: > >> When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the > >> delimiter and ends up with parse errors when trying to do the insert > >> > >> > >> What the ?? why dind' tthat go through with the body of the text.. *sigh* > >> I'll resend in the AM.. > > > >Good catch. It's definitely a bug in copy command. Please try > >following patches (this is against 7.2). > > > >*** src/backend/commands/copy.c.orig Tue Feb 26 21:11:05 2002 > >--- src/backend/commands/copy.c Tue Feb 26 21:11:35 2002 > >*************** > >*** 1024,1030 **** > > CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline, > char *null_print) > > { > > int c; > >! int delimc = delim[0]; > > > > #ifdef MULTIBYTE > > int mblen; > >--- 1024,1030 ---- > > CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline, > char *null_print) > > { > > int c; > >! int delimc = (unsigned char)delim[0]; > > > > #ifdef MULTIBYTE > > int mblen; > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Can someone explain why this fixes the problem. Think about a machine where char is signed by default. Extracting \254 into an int will produce -2, which will not equal \254 returned by getc. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Can someone explain why this fixes the problem. > > Think about a machine where char is signed by default. Extracting \254 > into an int will produce -2, which will not equal \254 returned by getc. Oh, I thought that the int returned by getc already had that sign extension, but now I remember it doesn't. In fact, it specifically returns an int so -1 can be identified. Got it. Seems I am forgetting some of my C. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026