Re: Conflict handling for COPY FROM - Mailing list pgsql-hackers
From | Alexey Kondratov |
---|---|
Subject | Re: Conflict handling for COPY FROM |
Date | |
Msg-id | f55afd63-8d64-e8df-0d54-13c116b8d23e@postgrespro.ru Whole thread Raw |
In response to | Re: Conflict handling for COPY FROM (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: Conflict handling for COPY FROM
Re: Conflict handling for COPY FROM |
List | pgsql-hackers |
On 28.06.2019 16:12, Alvaro Herrera wrote: >> On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres@anarazel.de> wrote: >>> Or even just return it as a row. CopyBoth is relatively widely supported >>> these days. >> i think generating warning about it also sufficiently meet its propose of >> notifying user about skipped record with existing logging facility >> and we use it for similar propose in other place too. The different >> i see is the number of warning that can be generated > Warnings seem useless for this purpose. I'm with Andres: returning rows > would make this a fine feature. If the user wants the rows in a table > as Andrew suggests, she can use wrap the whole thing in an insert. I agree with previous commentators that returning rows will make this feature more versatile. Though, having a possibility to simply skip conflicting/malformed rows is worth of doing from my perspective. However, pushing every single skipped row to the client as a separated WARNING will be too much for a bulk import. So maybe just overall stats about skipped rows number will be enough? Also, I would prefer having an option to ignore all errors, e.g. with option ERROR_LIMIT set to -1. Because it is rather difficult to estimate a number of future errors if you are playing with some badly structured data, while always setting it to 100500k looks ugly. Anyway, below are some issues with existing code after a brief review of the patch: 1) Calculation of processed rows isn't correct (I've checked). You do it in two places, and - processed++; + if (!cstate->error_limit) + processed++; is never incremented if ERROR_LIMIT is specified and no errors occurred/no constraints exist, so the result will always be 0. However, if primary column with constraints exists, then processed is calculated correctly, since another code path is used: + if (specConflict) + { + ... + } + else + processed++; I would prefer this calculation in a single place (as it was before patch) for simplicity and in order to avoid such problems. 2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT is specified and was exceeded, which doesn't seem to be correct, does it? - if (resultRelInfo->ri_NumIndices > 0) + if (resultRelInfo->ri_NumIndices > 0 && cstate->error_limit == 0) recheckIndexes = ExecInsertIndexTuples(myslot, 3) Trailing whitespaces added to error messages and tests for some reason: + ereport(WARNING, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("skipping \"%s\" --- missing data for column \"%s\" ", + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("missing data for column \"%s\" ", -ERROR: missing data for column "e" +ERROR: missing data for column "e" CONTEXT: COPY x, line 1: "2000 230 23 23" -ERROR: missing data for column "e" +ERROR: missing data for column "e" CONTEXT: COPY x, line 1: "2001 231 \N \N" Otherwise, the patch applies/compiles cleanly and regression tests are passed. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
pgsql-hackers by date: