Thread: Fixing backslash dot for COPY FROM...CSV

Fixing backslash dot for COPY FROM...CSV

From

"Daniel Verite"

Date:

18 December 2023, 20:35:53

  Hi,

PFA a patch that attempts to fix the bug that \. on a line
by itself is handled incorrectly by COPY FROM ... CSV.
This issue has been discussed several times previously,
for instance in [1] and [2], and mentioned in the
doc for \copy in commit 42d3125.

There's one case that works today: when
the line is part of a multi-line quoted section,
and the data is read from a file, not from the client.
In other situations, an error is raised or the data is cut at
the point of \. without an error.

The patch addresses that issue in the server and in psql,
except for the case of inlined data, where \. cannot be
both valid data and an EOF marker at the same time, so
it keeps treating it as an EOF marker.


[1]
https://www.postgresql.org/message-id/10e3eff6-eb04-4b3f-aeb4-b920192b977a@manitou-mail.org
[2]
https://www.postgresql.org/message-id/8aeab305-5e94-4fa5-82bf-6da6baee6e05%40app.fastmail.com


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Attachment

v1-0001-Support-backslash-dot-on-a-line-by-itself-as-vali.patch

Re: Fixing backslash dot for COPY FROM...CSV

From

vignesh C

Date:

19 December 2023, 08:46:40

On Tue, 19 Dec 2023 at 02:06, Daniel Verite <daniel@manitou-mail.org> wrote:
>
>   Hi,
>
> PFA a patch that attempts to fix the bug that \. on a line
> by itself is handled incorrectly by COPY FROM ... CSV.
> This issue has been discussed several times previously,
> for instance in [1] and [2], and mentioned in the
> doc for \copy in commit 42d3125.
>
> There's one case that works today: when
> the line is part of a multi-line quoted section,
> and the data is read from a file, not from the client.
> In other situations, an error is raised or the data is cut at
> the point of \. without an error.
>
> The patch addresses that issue in the server and in psql,
> except for the case of inlined data, where \. cannot be
> both valid data and an EOF marker at the same time, so
> it keeps treating it as an EOF marker.

I noticed that these tests are passing without applying patch too:
+++ b/src/test/regress/sql/copy.sql
@@ -38,6 +38,17 @@ copy copytest2 from :'filename' csv quote '''' escape E'\\';

 select * from copytest except select * from copytest2;

+--- test unquoted .\ as data inside CSV
+
+truncate copytest2;
+
+insert into copytest2(test) values('line1'), ('\.'), ('line2');
+copy (select test from copytest2 order by test collate "C") to :'filename' csv;
+-- get the data back in with copy
+truncate copytest2;
+copy copytest2(test) from :'filename' csv;
+select test from copytest2 order by test collate "C";

I was not sure if this was intentional. Can we add a test which fails
in HEAD and passes with the patch applied.

Regards,
Vignesh

Re: Fixing backslash dot for COPY FROM...CSV

From

"Daniel Verite"

Date:

19 December 2023, 11:27:42

    vignesh C wrote:

> I noticed that these tests are passing without applying patch too:

> +insert into copytest2(test) values('line1'), ('\.'), ('line2');
> +copy (select test from copytest2 order by test collate "C") to :'filename'
> csv;
> +-- get the data back in with copy
> +truncate copytest2;
> +copy copytest2(test) from :'filename' csv;
> +select test from copytest2 order by test collate "C";
>
> I was not sure if this was intentional. Can we add a test which fails
> in HEAD and passes with the patch applied.

Thanks for checking this out.
Indeed, that was not intentional. I've been using static files
in my tests and forgot that if the data was produced with
COPY OUT, it would quote backslash-dot so that COPY IN could
reload it without problem.

PFA an updated version that uses \qecho to produce the
data instead of COPY OUT. This test on unpatched HEAD
shows that copytest2 is missing 2 rows after COPY IN.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Attachment

v2-0001-Support-backslash-dot-on-a-line-by-itself-as-vali.patch

Re: Fixing backslash dot for COPY FROM...CSV

From

vignesh C

Date:

21 December 2023, 05:59:24

On Tue, 19 Dec 2023 at 16:57, Daniel Verite <daniel@manitou-mail.org> wrote:
>
>         vignesh C wrote:
>
> > I noticed that these tests are passing without applying patch too:
>
> > +insert into copytest2(test) values('line1'), ('\.'), ('line2');
> > +copy (select test from copytest2 order by test collate "C") to :'filename'
> > csv;
> > +-- get the data back in with copy
> > +truncate copytest2;
> > +copy copytest2(test) from :'filename' csv;
> > +select test from copytest2 order by test collate "C";
> >
> > I was not sure if this was intentional. Can we add a test which fails
> > in HEAD and passes with the patch applied.
>
> Thanks for checking this out.
> Indeed, that was not intentional. I've been using static files
> in my tests and forgot that if the data was produced with
> COPY OUT, it would quote backslash-dot so that COPY IN could
> reload it without problem.
>
> PFA an updated version that uses \qecho to produce the
> data instead of COPY OUT. This test on unpatched HEAD
> shows that copytest2 is missing 2 rows after COPY IN.

Thanks for the updated patch, any reason why this is handled only in csv.
postgres=# copy test1 from '/home/vignesh/postgres/inst/bin/copy1.out';
COPY 1
postgres=# select * from test1;
  c1
-------
 line1
(1 row)

postgres=# copy test1 from '/home/vignesh/postgres/inst/bin/copy1.out' csv;
COPY 1
postgres=# select * from test1;
  c1
-------
 line1
 \.
 line2
(3 rows)

As the documentation at [1] says:
An end-of-data marker is not necessary when reading from a file, since
the end of file serves perfectly well; it is needed only when copying
data to or from client applications using pre-3.0 client protocol.

[1] - https://www.postgresql.org/docs/devel/sql-copy.html

Regards,
Vignesh

Re: Fixing backslash dot for COPY FROM...CSV

From

"Daniel Verite"

Date:

21 December 2023, 19:47:14

    vignesh C wrote:

> Thanks for the updated patch, any reason why this is handled only in csv.
> postgres=# copy test1 from '/home/vignesh/postgres/inst/bin/copy1.out';
> COPY 1
> postgres=# select * from test1;
>  c1
> -------
> line1
> (1 row)

I believe it's safer to not change anything to the normal "non-csv"
text mode.
The current doc says that \. will not be taken as data in this format.
From https://www.postgresql.org/docs/current/sql-copy.html :

   Any other backslashed character that is not mentioned in the above
   table will be taken to represent itself. However, beware of adding
   backslashes unnecessarily, since that might accidentally produce a
   string matching the end-of-data marker (\.) or the null string (\N
   by default). These strings will be recognized before any other
   backslash processing is done.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Fixing backslash dot for COPY FROM...CSV

From

vignesh C

Date:

23 December 2023, 03:32:58

On Fri, 22 Dec 2023 at 01:17, Daniel Verite <daniel@manitou-mail.org> wrote:
>
>         vignesh C wrote:
>
> > Thanks for the updated patch, any reason why this is handled only in csv.
> > postgres=# copy test1 from '/home/vignesh/postgres/inst/bin/copy1.out';
> > COPY 1
> > postgres=# select * from test1;
> >  c1
> > -------
> > line1
> > (1 row)
>
> I believe it's safer to not change anything to the normal "non-csv"
> text mode.
> The current doc says that \. will not be taken as data in this format.
> From https://www.postgresql.org/docs/current/sql-copy.html :
>
>    Any other backslashed character that is not mentioned in the above
>    table will be taken to represent itself. However, beware of adding
>    backslashes unnecessarily, since that might accidentally produce a
>    string matching the end-of-data marker (\.) or the null string (\N
>    by default). These strings will be recognized before any other
>    backslash processing is done.

Thanks for the clarification. Then let's keep it as you have implemented.

Regards,
Vignesh

Re: Fixing backslash dot for COPY FROM...CSV

From

"Daniel Verite"

Date:

31 December 2023, 15:32:33

  Hi,

The CI patch tester fails on this patch, because it has a label
at the end of a C block, which I'm learning is a C23 feature
that happens to be supported by gcc 11 [1], but is not portable.

PFA an update fixing this, plus removing an obsolete chunk
in the COPY documentation that v2 left out.


[1] https://gcc.gnu.org/onlinedocs/gcc/Mixed-Labels-and-Declarations.html


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Attachment

v3-0001-Support-backslash-dot-on-a-line-by-itself-as-vali.patch

Re: Fixing backslash dot for COPY FROM...CSV

From

Robert Haas

Date:

15 January 2024, 21:05:15

On Mon, Dec 18, 2023 at 3:36 PM Daniel Verite <daniel@manitou-mail.org> wrote:
> PFA a patch that attempts to fix the bug that \. on a line
> by itself is handled incorrectly by COPY FROM ... CSV.
> This issue has been discussed several times previously,
> for instance in [1] and [2], and mentioned in the
> doc for \copy in commit 42d3125.

Those links unfortunately seem not to be entirely specific to this
issue. Other, related things seem to be discussed there, and it's not
obvious that everyone agrees on what to do, or really that anyone
agrees on what to do. The best link that I found for this exact issue
is https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
but the thread isn't very conclusive and is so old that any
conclusions reached then might no longer be considered valid today.

And I guess the reason I mention is that, supposing your patch were
technically perfect in every way, would everyone agree that it ought
to be committed? If Alice is a user with a CSV file that might contain
\. on a line by itself within a quoted CSV field, then Alice is
currently sad because she can't necessarily load all of her CSV files.
The patch would fix that, and make her happy. On the other hand, if
Bob is a user with a CSV-ish file that definitely doesn't contain \.
on a line by itself within a quoted CSV field but might have been
truncated in the middle of a quoted field, maybe Bob will be sad if
this patch gets committed, because he will no longer be able to append
\n\.\n to whatever junk he's got in the file and let the server sort
out whether to throw an error.

I have to admit that it seems more likely that there are people in the
world with Alice's problem rather than people in the world with Bob's
problem. We'd probably make more people happy with the change than
sad. But it is a definitional change, I think, and that's a little
scary, and maybe somebody will think that's a reason why we should
change nothing here. Part of my hesitancy, I suppose, is that I don't
understand why we even have this strange convention of making \.
terminate the input in the first place -- I mean, why wouldn't that be
done in some kind of out-of-band way, rather than including a special
marker in the data?

I can't help feeling a bit nervous about this first documentation hunk:

--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -761,11 +761,7 @@ COPY <replaceable class="parameter">count</replaceable>
     format, <literal>\.</literal>, the end-of-data marker, could also appear
     as a data value.  To avoid any misinterpretation, a <literal>\.</literal>
     data value appearing as a lone entry on a line is automatically
-    quoted on output, and on input, if quoted, is not interpreted as the
-    end-of-data marker.  If you are loading a file created by another
-    application that has a single unquoted column and might have a
-    value of <literal>\.</literal>, you might need to quote that value in the
-    input file.
+    quoted on output.
    </para>

    <note>

It doesn't feel right to me to just replace all of this text with
nothing. That leaves us documenting only the behavior on output,
whereas the previous text documents both the output behavior (we quote
it) and the input behavior (it has to be quoted to avoid being taken
as the EOF marker).

         /*
-         * In CSV mode, we only recognize \. alone on a line.  This is because
-         * \. is a valid CSV data value.
+         * In CSV mode, backslash is a normal character.
          */
-        if (c == '\\' && (!cstate->opts.csv_mode || first_char_in_line))
+        if (c == '\\' && !cstate->opts.csv_mode)

I don't think that the update comment accurately describes the
behavior. If I understand correctly what you're trying to fix, I'd
expect the new behavior to be that we only recognize \. alone on a
line and even then only if we're not inside a quoting string, but
that's not what the revised comment says. Instead, it claims that
backslash is just a normal character, but if that were true, the whole
if-statement wouldn't exist at all, since its purpose is to provide
special-handling for backslashes -- and indeed the patch does not
change that, since the if-statement is still looking for a backslash
and doing something special if one is found.

Hmm. Looking at the rest of the patch, it seems like you're removing
the logic that prevents us from interpreting

\. lksdghksdhgjskdghjs

as an end-of-file while in CSV mode. But I would have thought based on
what problem you're trying to fix that you would have wanted to keep
that logic and further restrict it so that it only applies when not
within a quoted string.

Maybe I'm misunderstanding what bug you're trying to fix?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Fixing backslash dot for COPY FROM...CSV

From

"Daniel Verite"

Date:

16 January 2024, 14:42:45

    Robert Haas wrote:

> Part of my hesitancy, I suppose, is that I don't
> understand why we even have this strange convention of making \.
> terminate the input in the first place -- I mean, why wouldn't that be
> done in some kind of out-of-band way, rather than including a special
> marker in the data?

The v3 protocol added the out-of-band method, but the v2 protocol
did not have it, and as far as I understand, this is the reason why
CopyReadLineText() must interpret \. as an end-of-data marker.

The v2 protocol was removed in pg14
https://www.postgresql.org/docs/release/14.0/
<quote>
  Remove server and libpq support for the version 2 wire protocol (Heikki
Linnakangas)
  This was last used as the default in PostgreSQL 7.3 (released in 2002).
</quote>

Also I hadnt' noticed this before, but the current doc has this mention
that is relevant to this patch:

https://www.postgresql.org/docs/current/protocol-changes.html
"Summary of Changes since Protocol 2.0"
<quote>
  COPY data is now encapsulated into CopyData and CopyDone
  messages. There is a well-defined way to recover from errors during
  COPY. The special “\.” last line is not needed anymore, and is not
  sent during COPY OUT. (It is still recognized as a terminator during
  COPY IN, but its use is deprecated and will eventually be removed.)
</quote>

What the present patch does is essentially, for the server-side part,
stop recognizing "\." as as terminator, like this paragraph says, but
it does that for CSV only, not for TEXT.

> Hmm. Looking at the rest of the patch, it seems like you're removing
> the logic that prevents us from interpreting
>
> \. lksdghksdhgjskdghjs
>
> as an end-of-file while in CSV mode. But I would have thought based on
> what problem you're trying to fix that you would have wanted to keep
> that logic and further restrict it so that it only applies when not
> within a quoted string.
>
> Maybe I'm misunderstanding what bug you're trying to fix?

The fix is that \. is no longer recognized as special in CSV, whether
alone on a line or not, and whether in a quoted section or not.
It's always interpreted as data, like it would have been in
the first place, I imagine, if the v2 protocol could have handled
it. This is why the patch consists mostly of removing code and
simplifying comments.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Fixing backslash dot for COPY FROM...CSV

From

"Daniel Verite"

Date:

24 January 2024, 16:01:15

    Robert Haas wrote:

> Those links unfortunately seem not to be entirely specific to this
> issue. Other, related things seem to be discussed there, and it's not
> obvious that everyone agrees on what to do, or really that anyone
> agrees on what to do. The best link that I found for this exact issue
> is
> https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
> but the thread isn't very conclusive and is so old that any
> conclusions reached then might no longer be considered valid today.

To refresh the problem statement, 4 cases that need fixing as
of HEAD can be distinguished:

#1. copy csv from file, single column, no quoting involved.
COPY will stop at \. and ignore the rest of the file without
any error or warning.

$ cat >/tmp/file.csv <<EOF
line1
\.
line2
EOF

$ psql <<EOF
create table contents(t text);
copy contents from '/tmp/file.csv' (format csv);
table contents;
EOF

Results in
   t
-------
 line1
(1 row)

The bug is that a single row is imported instead of the 3 rows of the file.

#2. Same as the previous case, but with file_fdw

$ psql <<EOF
CREATE EXTENSION file_fdw;

CREATE FOREIGN TABLE csv_data(line text) SERVER myserver
  OPTIONS (filename '/tmp/file.csv', format 'csv');

TABLE csv_data;
EOF

Results in:

line
-------
 line1
(1 row)

The bug is that rows 2 and 3 are missing, as in case #1.

#3. \copy csv from file with  \. inside a quoted multi-line section

This is the case that the above linked report mentioned,
resulting in a failure to import.
In addition to being legal CSV, these contents can be produced by
Postgres itself exporting in CSV.

$ cat >/tmp/file-quoted.csv <<EOF
line1
"
\.
"
line2
EOF

$ psql <<EOF
create table contents(t text);
\copy contents from '/tmp/file-quoted.csv' csv;
EOF

Results in an error:

ERROR:    unterminated CSV quoted field
CONTEXT:  COPY contents, line 4: ""
\.
"

The expected result is that it imports 3 rows without error.

#4. \copy csv from file, single column, no quoting involved
This is the same case as #1 except it uses the client-server protocol.

$ cat >/tmp/file.csv <<EOF
line1
\.
line2
EOF

$ psql <<EOF
create table contents(t text);
\copy contents from '/tmp/file.csv' (format csv);
TABLE contents;
EOF

Results in
   t
-------
 line1
(1 row)

As in case #1, a single row is imported instead of 3 rows.

The proposed  patch addresses these cases by making the sequence
\. non-special in CSV (in fact backslash itself is a normal character in
CSV).
It does not address the cases when the data is embedded after
the COPY command or typed interactively in psql, since these cases
require an explicit end-of-data marker, and CSV does not have
the concept of an end-of-data marker.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Fixing backslash dot for COPY FROM...CSV

From

Tom Lane

Date:

02 April 2024, 19:44:08

"Daniel Verite" <daniel@manitou-mail.org> writes:
>     Robert Haas wrote:
>> Those links unfortunately seem not to be entirely specific to this
>> issue. Other, related things seem to be discussed there, and it's not
>> obvious that everyone agrees on what to do, or really that anyone
>> agrees on what to do.

> The proposed  patch addresses these cases by making the sequence
> \. non-special in CSV (in fact backslash itself is a normal character in
> CSV).
> It does not address the cases when the data is embedded after
> the COPY command or typed interactively in psql, since these cases
> require an explicit end-of-data marker, and CSV does not have
> the concept of an end-of-data marker.

I've looked over this patch and I generally agree that this is a
reasonable solution.  While it's barely possible that somebody
out there is depending on the current behavior of \. in CSV mode,
it seems considerably more likely that people who run into it
would consider it a bug.  In any case, the patch isn't proposed
for back-patch, and as cross-version incompatibilities go this
seems like a pretty small one.

I concur with Robert's doubts about some of the doc changes though.
In particular, since we're not changing the behavior for non-CSV
mode, we shouldn't remove any statements that apply to non-CSV mode.

I'm also wondering why the patch adds a test for
"PQprotocolVersion(conn) >= 3" in handleCopyIn.  As already
noted upthread, we ripped out all support for protocol 2.0
some time ago, so that sure looks like dead code.

            regards, tom lane

Re: Fixing backslash dot for COPY FROM...CSV

From

"Daniel Verite"

Date:

05 April 2024, 12:58:39

    Tom Lane wrote:

> I've looked over this patch and I generally agree that this is a
> reasonable solution.

Thanks for reviewing this!

> I'm also wondering why the patch adds a test for
> "PQprotocolVersion(conn) >= 3" in handleCopyIn.

I've removed this in the attached update.

> I concur with Robert's doubts about some of the doc changes though.
> In particular, since we're not changing the behavior for non-CSV
> mode, we shouldn't remove any statements that apply to non-CSV mode.

I don't quite understand the issues with the doc changes. Let me
detail the changes.

The first change is under
  <refsect2>
   <title>CSV Format</title>
so it does no concern non-CSV modes.

--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -761,11 +761,7 @@ COPY <replaceable class="parameter">count</replaceable>
    format, <literal>\.</literal>, the end-of-data marker, could also appear
    as a data value.  To avoid any misinterpretation, a <literal>\.</literal>
    data value appearing as a lone entry on a line is automatically
-    quoted on output, and on input, if quoted, is not interpreted as the
-    end-of-data marker.  If you are loading a file created by another
-    application that has a single unquoted column and might have a
-    value of <literal>\.</literal>, you might need to quote that value in
the
-    input file.
+    quoted on output.
   </para>

The part about quoting output is kept because the code still does that.

About this bit:
 "and on input, if quoted, is not interpreted as the end-of-data marker."

So the patch removes that part. The reason is \. is now not interpreted as
EOD in both cases, quoted or unquoted, conforming to spec.
Previously, what the reader should have understood by "if quoted, ..."
is that it implies "if not quoted, then .\ will be interpreted as EOD
even though that behavior does not conform to the CSV spec".
If we documented the new behavior, it would be something like:
when quoted, it works as expected, and when unquoted, it works as
expected too. Isn't it simpler not to say anything?

About the next sentence:
  "If you are loading a file created by another application
  that has a single unquoted column and might have a value of \., you
  might need to quote that value in the input file."

It's removed as well because it's not longer necessary
to do that.

The other hunk is in psql doc:

--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1119,10 +1119,6 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value'
'second value' \g
     destination, because all data must pass through the client/server
     connection.  For large amounts of data the <acronym>SQL</acronym>
     command might be preferable.
-     Also, because of this pass-through method, <literal>\copy
-     ... from</literal> in <acronym>CSV</acronym> mode will erroneously
-     treat a <literal>\.</literal> data value alone on a line as an
-     end-of-input marker.

That behavior no longer happens, so this gets removed as well.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Attachment

v4-0001-Support-backslash-dot-on-a-line-by-itself-as-vali.patch

Re: Fixing backslash dot for COPY FROM...CSV

From

Tom Lane

Date:

05 April 2024, 16:02:37

"Daniel Verite" <daniel@manitou-mail.org> writes:
>     Tom Lane wrote:
>> I've looked over this patch and I generally agree that this is a
>> reasonable solution.

> Thanks for reviewing this!

While testing this, I tried running the tests with an updated server
and non-updated psql, and not only did the new test case fail, but
so did a bunch of existing ones.  That's because existing psql will
send the trailing "\." of inlined data to the server, and the updated
server will now think that's data if it's in CSV mode.

So this means that the patch introduces a rather serious cross-version
compatibility problem.  I doubt we can consider inlined CSV data to be
a niche case that few people use; but it will fail every time if your
psql is older than your server.

Not sure what to do here.  One idea is to install just the psql-side
fix, which should break nothing now that version-2 protocol is dead,
and then wait a few years before introducing the server-side change.
That seems kind of sad though.

An argument for not waiting is that psql may not be the only client
that needs this behavioral adjustment, and if so there's going to
be breakage anyway when we change the server; so we might as well
get it over with.

            regards, tom lane

Re: Fixing backslash dot for COPY FROM...CSV

From

Tom Lane

Date:

05 April 2024, 16:27:41

I wrote:
> So this means that the patch introduces a rather serious cross-version
> compatibility problem.  I doubt we can consider inlined CSV data to be
> a niche case that few people use; but it will fail every time if your
> psql is older than your server.

On third thought, that may not be as bad as I was thinking.
We don't blink at the idea that an older psql's \d commands may
malfunction with a newer server, and I think most users have
internalized the idea that they want psql >= server.  If the
patch created an incompatibility with that direction, it'd be
a problem, but I don't think it does.

            regards, tom lane

Re: Fixing backslash dot for COPY FROM...CSV

From

"Daniel Verite"

Date:

05 April 2024, 16:54:46

    Tom Lane wrote:

> Not sure what to do here.  One idea is to install just the psql-side
> fix, which should break nothing now that version-2 protocol is dead,
> and then wait a few years before introducing the server-side change.
> That seems kind of sad though.

Wouldn't backpatching solve this?  Then only the users who don't
apply the minor updates would have non-matching server and psql.
Initially I though that obsoleting the v2 protocol was a recent
move, but reading older messages from the list I've got the
impression that it was more or less in the pipeline since way
before version 10.

Also one of the cases the patch fixes, the one when imported
data are silently truncated at the point of \., is quite nasty
IMO.
I can imagine an app where user-supplied data would be
appended row-by-row into a CSV file, and would be processed
periodically by batch. Under some conditions, in particular
if newlines in the first column are allowed, a malevolent user could
submit a \. sequence  to cause the batch to miss the rest of the data
without any error being raised.

[1]
https://www.postgresql.org/message-id/11648.1403147417%40sss.pgh.pa.us

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Fixing backslash dot for COPY FROM...CSV

From

Tom Lane

Date:

05 April 2024, 19:12:41

"Daniel Verite" <daniel@manitou-mail.org> writes:
>     Tom Lane wrote:
>> Not sure what to do here.  One idea is to install just the psql-side
>> fix, which should break nothing now that version-2 protocol is dead,
>> and then wait a few years before introducing the server-side change.
>> That seems kind of sad though.

> Wouldn't backpatching solve this?

No, it'd just reduce the surface area a bit.  People on less-than-
the-latest-minor-release would still have the issue.  In any case
back-patching further than v14 would be a nonstarter, because we
didn't remove protocol v2 support till then.

However, the analogy to "\d commands might fail against a newer
server" reduces my level of concern quite a lot: it's hard to
draw much of a line separating that kind of issue from "inline
COPY CSV will fail against a newer server".  It's not like such
failures won't be obvious and fairly easy to diagnose.

            regards, tom lane

Re: Fixing backslash dot for COPY FROM...CSV

From

Tom Lane

Date:

05 April 2024, 20:34:06

After some more poking at this topic, I realize that there is already
very strange and undocumented behavior for backslash-dot even in
non-CSV mode.  Create a file like this:

$ cat eofdata
foobar
foobaz\.
more
\.
yet more

and try importing it with COPY:

regression=# create table eofdata(f1 text);
CREATE TABLE
regression=# copy eofdata from '/home/tgl/pgsql/eofdata';
COPY 2
regression=# table eofdata;
   f1   
--------
 foobar
 foobaz
(2 rows)

That's what you get in 9.0 and earlier versions, and it's already
not-as-documented, because we claim that only \. alone on a line is an
EOF marker; we certainly don't suggest that what's in front of it will
be taken as valid data.  However, somebody broke it some more in 9.1,
because 9.1 up to HEAD produce this result:

regression=# create table eofdata(f1 text);
CREATE TABLE
regression=# copy eofdata from '/home/tgl/pgsql/eofdata';
COPY 3
regression=# table eofdata;
   f1   
--------
 foobar
 foobaz
 more
(3 rows)

So the current behavior is that \. that is on the end of a line,
but is not the whole line, is silently discarded and we keep going.

All versions throw "end-of-copy marker corrupt" if there is
something after \. on the same line.

This is sufficiently weird that I'm starting to come around to
Daniel's original proposal that we just drop the server's recognition
of \. altogether (which would allow removal of some dozens of lines of
complicated and now known-buggy code).  Alternatively, we could fix it
so that \. at the end of a line draws "end-of-copy marker corrupt",
which would at least make things consistent, but I'm not sure that has
any great advantage.  I surely don't want to document the current
behavioral details as being the right thing that we're going to keep
doing.

Thoughts?

            regards, tom lane

Re: Fixing backslash dot for COPY FROM...CSV

From

Tom Lane

Date:

06 April 2024, 17:03:12

I wrote:
> So the current behavior is that \. that is on the end of a line,
> but is not the whole line, is silently discarded and we keep going.

> All versions throw "end-of-copy marker corrupt" if there is
> something after \. on the same line.

> This is sufficiently weird that I'm starting to come around to
> Daniel's original proposal that we just drop the server's recognition
> of \. altogether (which would allow removal of some dozens of lines of
> complicated and now known-buggy code).

I experimented with that and soon ran into a nasty roadblock: it
breaks dump/restore, because pg_dump includes a "\." line after
COPY data whether or not it really needs one.  Worse, that's
implemented by including the "\." line into the archive format,
so that existing dump files contain it.  Getting rid of it would
require an archive format version bump, plus some hackery to allow
removal of the line when reading old dump files.

While that's surely doable with enough effort, it's not the kind
of thing to be undertaking with less than 2 days to feature freeze.
Not to mention that I'm not sure we have consensus to do it at all.

More fun stuff: PQgetline actually invents a "\." line when it
sees server end-of-copy, and we tell users of that function to
check for that not an out-of-band return value to detect EOF.
It looks like we have no callers of that in the core distro,
but do we want to deprecate it completely?

So I feel like we need to put this patch on the shelf for the moment
and come back to it early in v18.  Although it seems reasonably clear
what to do on the CSV side of things, it's very much less clear what
to do about text-format handling of EOD markers, and I don't want to
change one of those things in v17 and the other in v18.  Also it
seems like there are more dependencies on "\." than we realized.

There could be an argument for applying just the psql change now,
to remove its unnecessary sending of "\.".  That won't break
anything and it would give us at least one year's leg up on
compatibility issues.

            regards, tom lane

Re: Fixing backslash dot for COPY FROM...CSV

From

"Daniel Verite"

Date:

06 April 2024, 22:00:25

    Tom Lane wrote:

> This is sufficiently weird that I'm starting to come around to
> Daniel's original proposal that we just drop the server's recognition
> of \. altogether (which would allow removal of some dozens of lines of
> complicated and now known-buggy code)

FWIW my plan was to not change anything in the TEXT mode,
but I wasn't aware it had this issue that you found when
\. is not in a line by itself.

>  Alternatively, we could fix it so that \. at the end of a line draws
> "end-of-copy marker corrupt"
> which would at least make things consistent, but I'm not sure that has
> any great advantage.  I surely don't want to document the current
> behavioral details as being the right thing that we're going to keep
> doing.

Agreed we don't want to document that, but also why doesn't \. in the
contents represent just a dot  (as opposed to being an error),
just like \a is a?

I mean if eofdata contains

  foobar\a
  foobaz\aother

then we get after import:
      f1
--------------
 foobara
 foobazaother
(2 rows)

Reading the current doc on the text format, I can't see why
importing:

  foobar\.
  foobar\.other

is not supposed to produce
      f1
--------------
 foobar.
 foobaz.other
(2 rows)

I see these rules in [1] about backslash:

#1.
  "End of data can be represented by a single line containing just
   backslash-period (\.)."

foobar\. and foobar\.other do not match that so #1 does not describe
how they're interpreted.

#2.
  "Backslash characters (\) can be used in the COPY data to quote data
  characters that might otherwise be taken as row or column
  delimiters."

Dot is not a column delimiter (it's forbidden anyway), so #2 does
not apply.

#3.
  "In particular, the following characters must be preceded by a
  backslash if they appear as part of a column value: backslash itself,
  newline, carriage return, and the current delimiter character"

Dot is not in that list so #3 does not apply.

#4.
  "The following special backslash sequences are recognized by COPY
  FROM:" (followed by the table with \b \f, ...,)

Dot is not mentioned.

#5.
  "Any other backslashed character that is not mentioned in the above
  table will be taken to represent itself"

Here we say that backslash dot represents a dot (unless other
rules apply)

  foobar\. => foobar.
  foobar\.other => foobar.other

#6.
  "However, beware of adding backslashes unnecessarily, since that
   might accidentally produce a string matching the end-of-data marker
   (\.) or the null string (\N by default)."

So we *recommend* not to use \. but as I understand it, the warning
with the EOD marker is about accidentally creating a line that matches #1,
that is, \. alone on a line.

#7
  "These strings will be recognized before any other backslash
  processing is done."

TBH I don't understand what #7 implies. The order in backslash
processing looks like an implementation detail that should not
matter in understanding the format?

Considering this, it seems to me that #5 says that
backslash-dot represents a dot unless #1 applies, and the
other #2 #3 #4 #6 #7 rules do not state anything that would
contradict that.

[1] https://www.postgresql.org/docs/current/sql-copy.html

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Fixing backslash dot for COPY FROM...CSV

From

Bruce Momjian

Date:

07 April 2024, 02:55:04

On Sun, Apr  7, 2024 at 12:00:25AM +0200, Daniel Verite wrote:
>     Tom Lane wrote:
> 
> > This is sufficiently weird that I'm starting to come around to
> > Daniel's original proposal that we just drop the server's recognition
> > of \. altogether (which would allow removal of some dozens of lines of
> > complicated and now known-buggy code)
> 
> FWIW my plan was to not change anything in the TEXT mode,
> but I wasn't aware it had this issue that you found when
> \. is not in a line by itself.
> 
> >  Alternatively, we could fix it so that \. at the end of a line draws
> > "end-of-copy marker corrupt"
> > which would at least make things consistent, but I'm not sure that has
> > any great advantage.  I surely don't want to document the current
> > behavioral details as being the right thing that we're going to keep
> > doing.
> 
> Agreed we don't want to document that, but also why doesn't \. in the
> contents represent just a dot  (as opposed to being an error),
> just like \a is a?

I looked into this and started to realize that \. is the only copy
backslash command where we define the behavior only alone at the
beginning of a line, and not in other circumstances.  The \a example
above suggests \. should be period in all other cases, as suggested, but
I don't know the ramifications if that.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Re: Fixing backslash dot for COPY FROM...CSV

From

Tom Lane

Date:

07 April 2024, 04:07:13

Bruce Momjian <bruce@momjian.us> writes:
> On Sun, Apr  7, 2024 at 12:00:25AM +0200, Daniel Verite wrote:
>> Agreed we don't want to document that, but also why doesn't \. in the
>> contents represent just a dot  (as opposed to being an error),
>> just like \a is a?

> I looked into this and started to realize that \. is the only copy
> backslash command where we define the behavior only alone at the
> beginning of a line, and not in other circumstances.  The \a example
> above suggests \. should be period in all other cases, as suggested, but
> I don't know the ramifications if that.

Here's the problem: if some client-side code thinks it's okay to
quote "." as "\.", what is likely to happen when it's presented
a data value consisting only of "."?  It could very easily fall
into the trap of producing an end-of-data marker.

If we get rid of the whole concept of end-of-data markers, then
it'd be totally reasonable to accept "\." as ".".  But as long
as we still have end-of-data markers, I think it's unwise to allow
"\." to appear as anything but an end-of-data marker.  It'd just
add camouflage to the booby trap.

            regards, tom lane

Re: Fixing backslash dot for COPY FROM...CSV

From

Sutou Kouhei

Date:

19 July 2024, 06:10:31

Hi,

I'm reviewing patches in Commitfest 2024-07 from top to bottom:
https://commitfest.postgresql.org/48/

This is the 4th patch:
https://commitfest.postgresql.org/48/4710/

FYI: https://commitfest.postgresql.org/48/4681/ is my patch.

In <2726138.1712462833@sss.pgh.pa.us>
  "Re: Fixing backslash dot for COPY FROM...CSV" on Sun, 07 Apr 2024 00:07:13 -0400,
  Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Here's the problem: if some client-side code thinks it's okay to
> quote "." as "\.", what is likely to happen when it's presented
> a data value consisting only of "."?  It could very easily fall
> into the trap of producing an end-of-data marker.
> 
> If we get rid of the whole concept of end-of-data markers, then
> it'd be totally reasonable to accept "\." as ".".  But as long
> as we still have end-of-data markers, I think it's unwise to allow
> "\." to appear as anything but an end-of-data marker.  It'd just
> add camouflage to the booby trap.

I read through this thread. It seems that this thread
discuss 2 things:

1. \. in CSV mode
2. \. in non-CSV mode

Recent messages discussed mainly 2. but how about create a
separated thread for 2.? Because the original mail focused
on 1. and it seems that we can handle them separately.


Here are comments for the latest v4 patch:

----
diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c
index 961ae32694..a39818b193 100644
--- a/src/bin/psql/copy.c
+++ b/src/bin/psql/copy.c
@@ -620,20 +620,32 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res)
...
+                            if (copystream == pset.cur_cmd_source)
+                            {
+                                *fgresult='\0';
+                                buflen -= linelen;
----

Spaces around "=" are missing for the fgresult line.

BTW, here is a diff after pgindent:

----
diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c
index a39818b1933..4ee8481998a 100644
--- a/src/bin/psql/copy.c
+++ b/src/bin/psql/copy.c
@@ -621,13 +621,13 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res)
                 if (buf[buflen - 1] == '\n')
                 {
                     /*
-                     * When at the beginning of the line, check for EOF marker.
-                     * If the marker is found and the data is inlined,
+                     * When at the beginning of the line, check for EOF
+                     * marker. If the marker is found and the data is inlined,
                      * we must stop at this point.  If not, the \. line can be
-                     * sent to the server, and we let it decide whether
-                     * it's an EOF or not depending on the format: in
-                     * basic TEXT, \. is going to be interpreted as an EOF, in
-                     * CSV, it will not.
+                     * sent to the server, and we let it decide whether it's
+                     * an EOF or not depending on the format: in basic TEXT,
+                     * \. is going to be interpreted as an EOF, in CSV, it
+                     * will not.
                      */
                     if (at_line_begin && copystream == pset.cur_cmd_source)
                     {
@@ -635,15 +635,16 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res)
                             (linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0))
                         {
                             copydone = true;
+
                             /*
-                             * Remove the EOF marker from the data sent.
-                             * In the case of CSV, the EOF marker must be
+                             * Remove the EOF marker from the data sent. In
+                             * the case of CSV, the EOF marker must be
                              * removed, otherwise it would be interpreted by
                              * the server as valid data.
                              */
                             if (copystream == pset.cur_cmd_source)
                             {
-                                *fgresult='\0';
+                                *fgresult = '\0';
                                 buflen -= linelen;
                             }
                         }
----


I also confirmed that the updated server and non-updated
psql compatibility problem (the end-of-data "\." is
inserted). It seems that it's difficult to solve without
introducing incompatibility.

How about introducing a new COPY option that controls
whether "\." is ignored or not instead of this approach?
Here is a migration idea:

1. Add the new COPY option to v18
2. psql detects whether a server has the new COPY option
   2.1. If it's available, psql uses the option and doesn't send "\."
   2.2. If it's not available, psql doesn't use the option and sends "\."
3. psql always doesn't sends "\." after v17 or earlier reach EOL


Thanks,
-- 
kou

Re: Fixing backslash dot for COPY FROM...CSV

From

Tom Lane

Date:

24 July 2024, 17:36:45

Sutou Kouhei <kou@clear-code.com> writes:
> I read through this thread. It seems that this thread
> discuss 2 things:
> 1. \. in CSV mode
> 2. \. in non-CSV mode
> Recent messages discussed mainly 2. but how about create a
> separated thread for 2.? Because the original mail focused
> on 1. and it seems that we can handle them separately.

Well, we don't want to paint ourselves into a corner by considering
only part of the problem.

What I'm currently thinking is that we can't remove the special
treatment of \. in text mode.  It's not arguably a bug, because
it's been part of the specification since day one; and there
are too many compatibility risks in pg_dump and elsewhere.
I think we should fix it so that \. that's not alone on a line
throws an error, but I wouldn't go further than that.

> How about introducing a new COPY option that controls
> whether "\." is ignored or not instead of this approach?

No thanks.  Once we create such an option we'll never be
able to get rid of it.

            regards, tom lane

Re: Fixing backslash dot for COPY FROM...CSV

From

"Daniel Verite"

Date:

31 July 2024, 13:42:41

    Sutou Kouhei wrote:

> BTW, here is a diff after pgindent:

PFA a v5 with the cosmetic changes applied.

> I also confirmed that the updated server and non-updated
> psql compatibility problem (the end-of-data "\." is
> inserted). It seems that it's difficult to solve without
> introducing incompatibility.

To clarify the compatibility issue, the attached bash script
compares pre-patch and post-patch client/server combinations with
different cases, submitted with different copy variants.

case A: quoted backslash-dot sequence in CSV
case B: unquoted backslash-dot sequence in CSV
case C: CSV without backslash-dot
case D: text with backslash-dot at the end
case E: text without backslash-dot at the end

The different ways to submit the data:
copy from file
\copy from file
\copy from pstdin
copy from stdin with embedded data after the command

Also attaching the tables of results with the patch as it stands.
"Failed" is when psql reports an error and "Data mismatch"
is when it succeeds but with copied data differing from what was
expected.

Does your test has an equivalent in these results or is it a different
case?

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
=======================

Attachment

Re: Fixing backslash dot for COPY FROM...CSV

From

Sutou Kouhei

Date:

31 July 2024, 21:49:49

Hi,

In <4ffb7317-7e92-4cbd-a542-1e478af6ad5d@manitou-mail.org>
  "Re: Fixing backslash dot for COPY FROM...CSV" on Wed, 31 Jul 2024 15:42:41 +0200,
  "Daniel Verite" <daniel@manitou-mail.org> wrote:

>> BTW, here is a diff after pgindent:
> 
> PFA a v5 with the cosmetic changes applied.

Thanks for updating the patch.

>> I also confirmed that the updated server and non-updated
>> psql compatibility problem (the end-of-data "\." is
>> inserted). It seems that it's difficult to solve without
>> introducing incompatibility.
> 
> To clarify the compatibility issue, the attached bash script
> compares pre-patch and post-patch client/server combinations with
> different cases, submitted with different copy variants.
> 
> case A: quoted backslash-dot sequence in CSV
> case B: unquoted backslash-dot sequence in CSV
> case C: CSV without backslash-dot
> case D: text with backslash-dot at the end
> case E: text without backslash-dot at the end
> 
> The different ways to submit the data:
> copy from file
> \copy from file
> \copy from pstdin
> copy from stdin with embedded data after the command
> 
> Also attaching the tables of results with the patch as it stands.
> "Failed" is when psql reports an error and "Data mismatch"
> is when it succeeds but with copied data differing from what was
> expected.
> 
> Does your test has an equivalent in these results or is it a different
> case?

Sorry for not clarify my try. My try was:

Case C
+----------+----------------+
|  method  | patched-server+|
|          | unpatched-psql |
+----------+----------------+
| embedded | Data mismatch  |
+----------+----------------+

I confirmed that this case is "Data mismatch".



Thanks,
-- 
kou

Re: Fixing backslash dot for COPY FROM...CSV

From

Artur Zakirov

Date:

23 September 2024, 18:38:56

On Wed, 31 Jul 2024 at 15:42, Daniel Verite <daniel@manitou-mail.org> wrote:
>
>         Sutou Kouhei wrote:
>
> > BTW, here is a diff after pgindent:
>
> PFA a v5 with the cosmetic changes applied.

Thank you Daniel for working on it. I've tested the patch and it seems
it works as expected.

I have a couple of minor comments.

It seems it isn't necessary to handle "\." within
"CopyAttributeOutCSV()" (file "src/backend/commands/copyto.c")
anymore.

    /*
     * Because '\.' can be a data value, quote it if it appears alone on a
     * line so it is not interpreted as the end-of-data marker.
     */
    if (single_attr && strcmp(ptr, "\\.") == 0)
        use_quote = true;

You might see the difference in the test "cooy2.sql". Without changes
the output is:

    =# COPY testeoc TO stdout CSV;
    a\.
    \.b
    c\.d
    "\."

Another thing is that the comparison "copystream ==
pset.cur_cmd_source" happens twice within "handleCopyIn()". TBH it is
a bit confusing to me what is the real purpose of that check, but one
of the comparisons looks unnecessary.

-- 
Kind regards,
Artur
Supabase

Re: Fixing backslash dot for COPY FROM...CSV

From

Tom Lane

Date:

29 September 2024, 23:11:23

"Daniel Verite" <daniel@manitou-mail.org> writes:
> To clarify the compatibility issue, the attached bash script
> compares pre-patch and post-patch client/server combinations with
> different cases, submitted with different copy variants.
> ...
> Also attaching the tables of results with the patch as it stands.
> "Failed" is when psql reports an error and "Data mismatch"
> is when it succeeds but with copied data differing from what was
> expected.

Thanks for doing that --- that does indeed clarify matters.
Obviously, the cases we need to worry about most are the "Data
mismatch" ones, since those might lead to silent misbehavior.
I modified your script to show me the exact nature of the mismatches.
The results are:

* For case B, unpatched both sides, the "mismatch" is that the copy
stops at the \. line.  We should really call this the "old expected
behavior", since it's not impossible that somebody is relying on it.
We're okay with breaking that expectation, but the question is whether
there might be other surprises.

* For case B, unpatched-server+patched-psql, the mismatches are
exactly the same, i.e. the user sees the old behavior.  That's OK.

* For case B, patched-server+unpatched-psql, the mismatches are
different: the copy absorbs the \. as a data line and then stops.
So that's surprising and bad, as it's neither the old expected
behavior nor the new intended behavior.  However, there are two
things that make me less worried about this than I might be.
First, it's pretty likely that the server will not find "\."
alone on a line to be valid input data for the COPY, so that the
result will be an error not silently wrong data.  Second, the case
of patched-server+unpatched-psql is fairly uncommon, and people
know to expect that an old psql might not behave perfectly against
a newer server.  The other direction of version mismatch is of far
more concern: people do expect a new psql to work with an old server.

* For case C, patched-server+unpatched-psql, the mismatch for embedded
data is again that the copy absorbs the \. as a data line and then
stops.  Again, this isn't great but the mitigating factors are the
same as above.

In all other cases, the results are the same or strictly better than
before.  So on the whole I think we are good on the compatibility
front and I'm ready to press ahead with the v5 behavior.

However, I've not looked at Artur's coding suggestions downthread.
Do you want to review that and see if you want to adopt any of
those changes?

            regards, tom lane

Re: Fixing backslash dot for COPY FROM...CSV

From

Tom Lane

Date:

01 October 2024, 01:45:02

"Daniel Verite" <daniel@manitou-mail.org> writes:
> [ v6-0001-Support-backslash-dot-on-a-line-by-itself-as-vali.patch ]

I did some more work on the docs and comments, and pushed that.

Returning to my upthread thought that

>>> I think we should fix it so that \. that's not alone on a line
>>> throws an error, but I wouldn't go further than that.

here's a quick follow-on patch to make that happen.  It could
probably do with a test case to demonstrate the error, but
I didn't bother yet pending approval that we want to do this.
(This passes check-world as it stands, indicating we have no
existing test that expects this case to work.)

Also, I used the same error message "end-of-copy marker corrupt"
that we have for the case of junk following the marker, but
I can't say that I think much of that phrasing.  What do people
think of "end-of-copy marker is not alone on its line", instead?

            regards, tom lane

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index a280efe23f..2ea9679b3c 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1426,13 +1426,17 @@ CopyReadLineText(CopyFromState cstate)
                 }

                 /*
-                 * Transfer only the data before the \. into line_buf, then
-                 * discard the data and the \. sequence.
+                 * If there is any data on this line before the \., complain.
+                 */
+                if (cstate->line_buf.len > 0 ||
+                    prev_raw_ptr > cstate->input_buf_index)
+                    ereport(ERROR,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("end-of-copy marker corrupt")));
+
+                /*
+                 * Discard the \. and newline, then report EOF.
                  */
-                if (prev_raw_ptr > cstate->input_buf_index)
-                    appendBinaryStringInfo(&cstate->line_buf,
-                                           cstate->input_buf + cstate->input_buf_index,
-                                           prev_raw_ptr - cstate->input_buf_index);
                 cstate->input_buf_index = input_buf_ptr;
                 result = true;    /* report EOF */
                 break;

Re: Fixing backslash dot for COPY FROM...CSV

From

"Daniel Verite"

Date:

01 October 2024, 22:34:20

    Tom Lane wrote:

> > [ v6-0001-Support-backslash-dot-on-a-line-by-itself-as-vali.patch ]
>
> I did some more work on the docs and comments, and pushed that.

Thanks!

> Returning to my upthread thought that
>
> >>> I think we should fix it so that \. that's not alone on a line
> >>> throws an error, but I wouldn't go further than that.
>
> here's a quick follow-on patch to make that happen.  It could
> probably do with a test case to demonstrate the error, but
> I didn't bother yet pending approval that we want to do this.

+1 for fixing. In particular, the behavior that silently drops data
after a misplaced marker seems hard to argue for:

postgres=# \copy foo(c1,c2) from stdin
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> a    b
>> c    \.
>> d    e
>> \.
COPY 2
postgres=# table foo;
 c1 | c2
----+----
 a  | b
 c  |
(2 rows)


> Also, I used the same error message "end-of-copy marker corrupt"
> that we have for the case of junk following the marker, but
> I can't say that I think much of that phrasing.  What do people
> think of "end-of-copy marker is not alone on its line", instead?

Yes, it's more precise and more helpful.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Fixing backslash dot for COPY FROM...CSV

From

Tom Lane

Date:

01 October 2024, 23:17:19

"Daniel Verite" <daniel@manitou-mail.org> writes:
> Tom Lane wrote:
>> Returning to my upthread thought that
>>> I think we should fix it so that \. that's not alone on a line
>>> throws an error, but I wouldn't go further than that.
>> here's a quick follow-on patch to make that happen.  It could
>> probably do with a test case to demonstrate the error, but
>> I didn't bother yet pending approval that we want to do this.

> +1 for fixing.

It's been on the thread for awhile now without objections,
so I'll go ahead and make that happen.

            regards, tom lane