Re: multiline CSV fields - Mailing list pgsql-hackers
| From | Andrew Dunstan |
|---|---|
| Subject | Re: multiline CSV fields |
| Date | |
| Msg-id | 41AFAC25.3080405@dunslane.net Whole thread Raw |
| In response to | Re: multiline CSV fields (Andrew Dunstan <andrew@dunslane.net>) |
| Responses |
Re: multiline CSV fields
Re: [PATCHES] multiline CSV fields |
| List | pgsql-hackers |
I wrote:
>
> If it bothers you that much. I'd make a flag, cleared at the start of
> each COPY, and then where we test for CR or LF in CopyAttributeOutCSV,
> if the flag is not set then set it and issue the warning.
I didn't realise until Bruce told me just now that I was on the hook for
this. I guess i should keep my big mouth shut. (Yeah, that's gonna
happen ...)
Anyway, here's a tiny patch that does what I had in mind.
cheers
andrew
Index: copy.c
===================================================================
RCS file: /home/cvsmirror/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.234
diff -c -r1.234 copy.c
*** copy.c 6 Nov 2004 17:46:27 -0000 1.234
--- copy.c 2 Dec 2004 23:34:20 -0000
***************
*** 98,103 ****
--- 98,104 ----
static EolType eol_type; /* EOL type of input */
static int client_encoding; /* remote side's character encoding */
static int server_encoding; /* local encoding */
+ static bool embedded_line_warning;
/* these are just for error messages, see copy_in_error_callback */
static bool copy_binary; /* is it a binary copy? */
***************
*** 1190,1195 ****
--- 1191,1197 ----
attr = tupDesc->attrs;
num_phys_attrs = tupDesc->natts;
attr_count = list_length(attnumlist);
+ embedded_line_warning = false;
/*
* Get info about the columns we need to process.
***************
*** 2627,2632 ****
--- 2629,2653 ----
!use_quote && (c = *test_string) != '\0';
test_string += mblen)
{
+ /*
+ * We don't know here what the surrounding line end characters
+ * might be. It might not even be under postgres' control. So
+ * we simple warn on ANY embedded line ending character.
+ *
+ * This warning will disappear when we make line parsing field-aware,
+ * so that we can reliably read in embedded line ending characters
+ * regardless of the file's line-end context.
+ *
+ */
+
+ if (!embedded_line_warning && (c == '\n' || c == '\r') )
+ {
+ embedded_line_warning = true;
+ elog(WARNING,
+ "CSV fields with embedded linefeed or carriage return "
+ "characters might not be able to be reimported");
+ }
+
if (c == delimc || c == quotec || c == '\n' || c == '\r')
use_quote = true;
if (!same_encoding)
pgsql-hackers by date: