Thread: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV
COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV
From
"Svante Richter"
Date:
Hello!
The documentation for COPY says "To avoid any misinterpretation, a
\.
data value appearing as a lone entry on a line is automatically quoted on output, and on input, if quoted, is not interpreted as the end-of-data marker".The input part only seems to work when using the COPY FROM CSV command, not \COPY FROM CSV. This is mentioned in a previous message here https://www.postgresql.org/message-id/a89f47e1-f716-4ae3-b882-cab5032a5d66%40manitou-mail.org but not documented.
This means that COPY TO CSV produces data that \COPY FROM CSV cannot read, which I'm assuming should be fixed (or at the very least documented as a serious limitation of \COPY FROM CSV). I found this out by not being able to load a backup of a table that I had exported via COPY TO CSV.
As the above message also mentioned this can be a security risk if using \COPY FROM STDIN CSV with untrusted data (https://www.postgresql.org/message-id/20190128214448.GH26761%40momjian.us says "I think the question is how many people are using CSV/STDIN for insecure data loads?") but I would absolutely expect data produced with COPY TO CSV to be safe to pipe to a \COPY FROM CSV, but this bug makes that unsafe unless I also explicitly set ON_ERROR_STOP=1.
SQL to reproduce:
CREATE TABLE testtable (a TEXT);
INSERT INTO testtable VALUES ('
\.
');
COPY testtable TO '/run/postgresql/test.csv' CSV;
COPY testtable FROM '/run/postgresql/test.csv' CSV; -- This one works
\COPY testtable FROM '/run/postgresql/test.csv' CSV; -- This one does not work
Error message:
ERROR: unterminated CSV quoted field
CONTEXT: COPY testtable, line 1: ""
"
Versions tested:
psql (PostgreSQL) 14.3 (under arch linux)
psql (PostgreSQL) 13.7 (Ubuntu 13.7-0ubuntu0.21.10.1)
Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV
From
"Svante Richter"
Date:
A gentle bump for this, hopefully that's alright!
On Wed, Jun 15, 2022, at 2:16 PM, Svante Richter wrote:
Hello!The documentation for COPY says "To avoid any misinterpretation, a\.
data value appearing as a lone entry on a line is automatically quoted on output, and on input, if quoted, is not interpreted as the end-of-data marker".The input part only seems to work when using the COPY FROM CSV command, not \COPY FROM CSV. This is mentioned in a previous message here https://www.postgresql.org/message-id/a89f47e1-f716-4ae3-b882-cab5032a5d66%40manitou-mail.org but not documented.This means that COPY TO CSV produces data that \COPY FROM CSV cannot read, which I'm assuming should be fixed (or at the very least documented as a serious limitation of \COPY FROM CSV). I found this out by not being able to load a backup of a table that I had exported via COPY TO CSV.As the above message also mentioned this can be a security risk if using \COPY FROM STDIN CSV with untrusted data (https://www.postgresql.org/message-id/20190128214448.GH26761%40momjian.us says "I think the question is how many people are using CSV/STDIN for insecure data loads?") but I would absolutely expect data produced with COPY TO CSV to be safe to pipe to a \COPY FROM CSV, but this bug makes that unsafe unless I also explicitly set ON_ERROR_STOP=1.SQL to reproduce:CREATE TABLE testtable (a TEXT);INSERT INTO testtable VALUES ('\.');COPY testtable TO '/run/postgresql/test.csv' CSV;COPY testtable FROM '/run/postgresql/test.csv' CSV; -- This one works\COPY testtable FROM '/run/postgresql/test.csv' CSV; -- This one does not workError message:ERROR: unterminated CSV quoted fieldCONTEXT: COPY testtable, line 1: """Versions tested:psql (PostgreSQL) 14.3 (under arch linux)psql (PostgreSQL) 13.7 (Ubuntu 13.7-0ubuntu0.21.10.1)
On Wed, Jun 15, 2022 at 02:16:14PM +0200, Svante Richter wrote: > The documentation for COPY says "To avoid any misinterpretation, a `\.` data value appearing as a lone entry on a lineis automatically quoted on output, and on input, if quoted, is not interpreted as the end-of-data marker". > > The input part only seems to work when using the COPY FROM CSV command, not \COPY FROM CSV. This is mentioned in a previousmessage here https://www.postgresql.org/message-id/a89f47e1-f716-4ae3-b882-cab5032a5d66%40manitou-mail.org but notdocumented. I agree this warrants a doc mention. Also, the previous thread seems to have ended with a focus on [\copy ... from stdin], but it's broader than that: > COPY testtable FROM '/run/postgresql/test.csv' CSV; -- This one works > \COPY testtable FROM '/run/postgresql/test.csv' CSV; -- This one does not work The psql documentation says [\copy ... from stdin] looks for "\.". It says nothing about doing that for [\copy ... from filename]. I wonder if psql should change the filename case to send the whole file to the server, ignoring "\." inside. (That is to say, change to match the psql documentation.) This, too, has an avoidable case of the problem: create table t (c text); \copy t from program 'printf ''"foo\n\\.\nbar"''' with csv
Noah Misch <noah@leadboat.com> writes: > The psql documentation says [\copy ... from stdin] looks for "\.". It says > nothing about doing that for [\copy ... from filename]. I wonder if psql > should change the filename case to send the whole file to the server, ignoring > "\." inside. (That is to say, change to match the psql documentation.) This seems like a sensible solution. The need to use an in-band EOF marker when reading from the command source file is clear, and there's no non-kluge way there. But that doesn't mean we should extend it to separate files. (I'm surprised to realize we do, actually.) regards, tom lane
Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV
From
Bruce Momjian
Date:
On Sun, Aug 14, 2022 at 10:08:55AM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > The psql documentation says [\copy ... from stdin] looks for "\.". It says > > nothing about doing that for [\copy ... from filename]. I wonder if psql > > should change the filename case to send the whole file to the server, ignoring > > "\." inside. (That is to say, change to match the psql documentation.) > > This seems like a sensible solution. The need to use an in-band EOF > marker when reading from the command source file is clear, and there's > no non-kluge way there. But that doesn't mean we should extend it > to separate files. (I'm surprised to realize we do, actually.) So, look what I found in psql/copy.c: /* check for EOF marker, but not on a partial line */ if (at_line_begin) { /* --> * This code erroneously assumes '\.' on a line alone --> * inside a quoted CSV string terminates the \copy. --> * https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org */ if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) || (linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0)) { copydone = true; } } I wondered who would reference an email thread in the source code, so I looked at the whole 2012 thread: https://www.postgresql.org/message-id/flat/E1TdNVQ-0001ju-GO%40wrigleys.postgresql.org and it was, of course, me. :-( Tom's analysis was similar ten years ago: > Ugh. This seems like a rather fundamental oversight in the CSV feature. > The problem is that psql has no idea whether the copy is being done in > CSV mode or not --- and even if it did, it doesn't parse the data fully > enough to realize whether a \. line is inside quotes or not. > > In the case of out-of-line data files, it might be reasonable to just > dispense with the check for \. altogether and always ship the whole file > to the backend; I think there's a \. check on the backend side. (Not > sure this is safe in V2 protocol, but I doubt anyone cares anymore > about that.) > > In the case of in-line data in a script file, CSV mode seems a bit > broken in any case; there's no concept of a terminator in CSV, AFAIK. > So maybe we don't have to worry about that. I think at this point we should either fix this or document it. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
On Tue, Aug 16, 2022 at 11:26:47AM -0400, Bruce Momjian wrote: > On Sun, Aug 14, 2022 at 10:08:55AM -0400, Tom Lane wrote: > > Noah Misch <noah@leadboat.com> writes: > > > The psql documentation says [\copy ... from stdin] looks for "\.". It says > > > nothing about doing that for [\copy ... from filename]. I wonder if psql > > > should change the filename case to send the whole file to the server, ignoring > > > "\." inside. (That is to say, change to match the psql documentation.) > > > > This seems like a sensible solution. The need to use an in-band EOF > > marker when reading from the command source file is clear, and there's > > no non-kluge way there. But that doesn't mean we should extend it > > to separate files. (I'm surprised to realize we do, actually.) > Tom's analysis was similar ten years ago: > > > Ugh. This seems like a rather fundamental oversight in the CSV feature. > > The problem is that psql has no idea whether the copy is being done in > > CSV mode or not --- and even if it did, it doesn't parse the data fully > > enough to realize whether a \. line is inside quotes or not. > > > > In the case of out-of-line data files, it might be reasonable to just > > dispense with the check for \. altogether and always ship the whole file > > to the backend; I think there's a \. check on the backend side. (Not > > sure this is safe in V2 protocol, but I doubt anyone cares anymore > > about that.) > I think at this point we should either fix this or document it. Things have gotten simpler in the last ten years, specifically commit 3174d69 removing protocol v2. Before that, the server would have mishandled "\." even if the client didn't. These days, one can fix all but [\copy ... from stdin].
Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV
From
Bruce Momjian
Date:
On Tue, Aug 16, 2022 at 11:26:47AM -0400, Bruce Momjian wrote: > On Sun, Aug 14, 2022 at 10:08:55AM -0400, Tom Lane wrote: > > Noah Misch <noah@leadboat.com> writes: > > > The psql documentation says [\copy ... from stdin] looks for "\.". It says > > > nothing about doing that for [\copy ... from filename]. I wonder if psql > > > should change the filename case to send the whole file to the server, ignoring > > > "\." inside. (That is to say, change to match the psql documentation.) > > > > This seems like a sensible solution. The need to use an in-band EOF > > marker when reading from the command source file is clear, and there's > > no non-kluge way there. But that doesn't mean we should extend it > > to separate files. (I'm surprised to realize we do, actually.) > > So, look what I found in psql/copy.c: > > /* check for EOF marker, but not on a partial line */ > if (at_line_begin) > { > /* > --> * This code erroneously assumes '\.' on a line alone > --> * inside a quoted CSV string terminates the \copy. > --> * https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org > */ > if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) || > (linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0)) > { > copydone = true; > } > } > > I wondered who would reference an email thread in the source code, so I > looked at the whole 2012 thread: > > https://www.postgresql.org/message-id/flat/E1TdNVQ-0001ju-GO%40wrigleys.postgresql.org > > and it was, of course, me. :-( > > Tom's analysis was similar ten years ago: > > > Ugh. This seems like a rather fundamental oversight in the CSV feature. > > The problem is that psql has no idea whether the copy is being done in > > CSV mode or not --- and even if it did, it doesn't parse the data fully > > enough to realize whether a \. line is inside quotes or not. > > > > In the case of out-of-line data files, it might be reasonable to just > > dispense with the check for \. altogether and always ship the whole file > > to the backend; I think there's a \. check on the backend side. (Not > > sure this is safe in V2 protocol, but I doubt anyone cares anymore > > about that.) > > > > In the case of in-line data in a script file, CSV mode seems a bit > > broken in any case; there's no concept of a terminator in CSV, AFAIK. > > So maybe we don't have to worry about that. > > I think at this point we should either fix this or document it. I have decided to document this with the attached patch. I plan to apply it to all supported Postgres versions. If we ever fix FROM file so only FROM STDIN is a problem, we can update the documentation. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Attachment
Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV
From
Bruce Momjian
Date:
On Sat, Oct 28, 2023 at 11:38:37AM -0400, Bruce Momjian wrote: > On Tue, Aug 16, 2022 at 11:26:47AM -0400, Bruce Momjian wrote: > > I think at this point we should either fix this or document it. > > I have decided to document this with the attached patch. I plan to > apply it to all supported Postgres versions. If we ever fix FROM file > so only FROM STDIN is a problem, we can update the documentation. Patch applied to all supported versions. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.