Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row - Mailing list pgsql-hackers

From torikoshia
Subject Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date
Msg-id 501dd655ddb04693c15baeb6485bc601@oss.nttdata.com
Whole thread Raw
In response to Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
List pgsql-hackers
On 2024-11-09 21:55, Kirill Reshke wrote:

Thanks for working on this!

> On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fujii@oss.nttdata.com> 
> wrote:
>> 
>> 
>> 
>> On 2024/10/26 6:03, Kirill Reshke wrote:
>> > when the REJECT LIMIT is set to some non-zero number and the number of
>> > row NULL replacements exceeds the limit, is it OK to fail. Because
>> > there WAS errors, and we should not tolerate more than $limit errors .
>> > I do find this behavior to be consistent.
>> 
>> +1
>> 
>> 
>> > But what if we don't set a REJECT LIMIT, it is sane to do all
>> > replacements, as if REJECT LIMIT is inf.
>> 
>> +1
> 
> After thinking for a while, I'm now more opposed to this approach. I
> think we should count rows with erroneous data as errors only if
> null substitution for these rows failed, not the total number of rows
> which were modified.
> Then, to respect the REJECT LIMIT option, we compare this number with
> the limit. This is actually simpler approach IMHO. What do You think?

IMHO I prefer the previous interpretation.
I'm not sure this is what people expect, but I assume that REJECT_LIMIT 
is used to specify how many malformed rows are acceptable in the 
"original" data source.

>> > But while I was trying to implement that, I realized that I don't
>> > understand v4 of this patch. My misunderstanding is about
>> > `t_on_error_null` tests. We are allowed to insert a NULL value for the
>> > first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why
>> > do we do that? My thought is we should try to execute
>> > InputFunctionCallSafe with NULL value  (i mean, here [1]) for the
>> > column after we failed to insert the input value. And, if this second
>> > call is successful, we do replacement, otherwise we count the row as
>> > erroneous.
>> 
>> Your concern is valid. Allowing NULL to be stored in a column with a 
>> NOT NULL
>> constraint via COPY ON_ERROR=SET_TO_NULL does seem unexpected. As you 
>> suggested,
>> NULL values set by SET_TO_NULL should probably be re-evaluated.
> 
> Thank you. I updated the patch with a NULL re-evaluation.
> 
> 
> PFA v7. I did not yet update the doc for this patch version, waiting
> for feedback about REJECT LIMIT + SET_TO_NULL behaviour.
> 


There were warnings when I applied the patch:

$ git apply 
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:170: 
trailing whitespace.
                    /*
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:181: 
trailing whitespace.

v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:189: 
trailing whitespace.
                                    /* If datatype if okay with NULL, 
replace
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:196: 
trailing whitespace.

v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:212: 
trailing whitespace.
                                 /*

>  @@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState 
> *pstate, bool is_from)
>                   parser_errposition(pstate, def->location)));
...
> 
>  -   if (opts_out->reject_limit && !opts_out->on_error)
>  +   if (opts_out->reject_limit && !(opts_out->on_error == 
> COPY_ON_ERROR_NULL || opts_out->on_error == COPY_ON_ERROR_IGNORE))

Hmm, is this change necessary?
Personally, I feel the previous code is easier to read.


"REJECT LIMIT" should be "REJECT_LIMIT"?
> 1037                             errhint("Consider specifying the 
> REJECT LIMIT option to skip erroneous rows.")));


SET_TO_NULL is one of the value for ON_ERROR, but the patch treats 
SET_TO_NULL as option for COPY:

221 --- a/src/bin/psql/tab-complete.in.c
222 +++ b/src/bin/psql/tab-complete.in.c
223 @@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id,
224         COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
225                       "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
226                       "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", 
"DEFAULT",
227 -                     "ON_ERROR", "LOG_VERBOSITY");
228 +                     "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");


> Best regards,
> Kirill Reshke

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.



pgsql-hackers by date:

Previous
From: "Joel Jacobson"
Date:
Subject: Re: [BUG] psql: Make \copy from 'text' and 'csv' formats fail on NUL bytes
Next
From: Amit Kapila
Date:
Subject: Re: doc: pgevent.dll location