On Fri, Nov 1, 2024, at 22:28, Masahiko Sawada wrote:
> As I mentioned in a separate email, if we use the OS default EOL as
> the default EOL in raw format, it would not be necessary to allow it
> to be multi characters. I think it's worth considering it.
I like the idea, but not sure I understand how it would work.
What if a user's OS default is \n (LF) and this user wants
to import a Windows text file \r\n (CR LR), which is a
multi characters EOL delimiter.
Was your idea to make an exception for that particular EOL,
or to simply not support that edge case?
> ---
> * copyfromparse.c
> * Parse CSV/text/binary format for COPY FROM.
>
> We need to update here as well.
>
> --
> - * 4. CopyReadAttributesText/CSV() function takes the input line from
> + * 4. CopyReadAttributesText/CSV/Raw() function takes the input line from
>
> I think we don't have CopyReadAttributesRaw() function now.
>
> ---
> I think it would be better to explain how to parse data in raw mode,
> especially which steps in the pipeline we skip, in the comment at the
> top of copyfromparse.c.
>
> ---
> - if (cstate->opts.format == COPY_FORMAT_CSV)
> + if (cstate->opts.format == COPY_FORMAT_RAW)
> {
> - /*
> - * If character is '\r', we may need to look ahead below. Force
> - * fetch of the next character if we don't already have it. We
> - * need to do this before changing CSV state, in case '\r' is also
> - * the quote or escape character.
> - */
> - if (c == '\r')
> + if (delim_len == 0)
> {
> - IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
> + /* When reading entire file, consume all remaining
> bytes at once */
> + input_buf_ptr = copy_buf_len;
> + continue;
> }
> + else
>
> I think that the code become much simpler if we do 'continue' at the
> end of 'if (cstate->opts.format == COPY_FORMAT_RAW)' branch, instead
> of doing 'if (cstate->opts.format == COPY_FORMAT_RAW) {} else ...'.
>
> ---
> @@ -1461,6 +1536,7 @@ CopyReadLineText(CopyFromState cstate)
> return result;
> }
>
> +
> /*
> * Return decimal value for a hexadecimal digit
> */
> @@ -1937,7 +2013,6 @@ endfield:
> return fieldno;
> }
>
> -
> /*
> * Read a binary attribute
> */
>
> Unnecessary line addition and removal.
Thanks for review, I agree on all comments,
awaiting your comment on EOL before implementing changes.
/Joel