Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row - Mailing list pgsql-hackers
| From | torikoshia |
|---|---|
| Subject | Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row |
| Date | |
| Msg-id | 501dd655ddb04693c15baeb6485bc601@oss.nttdata.com Whole thread Raw |
| In response to | Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row ("David G. Johnston" <david.g.johnston@gmail.com>) |
| Responses |
Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
|
| List | pgsql-hackers |
On 2024-11-09 21:55, Kirill Reshke wrote:
Thanks for working on this!
> On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fujii@oss.nttdata.com>
> wrote:
>>
>>
>>
>> On 2024/10/26 6:03, Kirill Reshke wrote:
>> > when the REJECT LIMIT is set to some non-zero number and the number of
>> > row NULL replacements exceeds the limit, is it OK to fail. Because
>> > there WAS errors, and we should not tolerate more than $limit errors .
>> > I do find this behavior to be consistent.
>>
>> +1
>>
>>
>> > But what if we don't set a REJECT LIMIT, it is sane to do all
>> > replacements, as if REJECT LIMIT is inf.
>>
>> +1
>
> After thinking for a while, I'm now more opposed to this approach. I
> think we should count rows with erroneous data as errors only if
> null substitution for these rows failed, not the total number of rows
> which were modified.
> Then, to respect the REJECT LIMIT option, we compare this number with
> the limit. This is actually simpler approach IMHO. What do You think?
IMHO I prefer the previous interpretation.
I'm not sure this is what people expect, but I assume that REJECT_LIMIT
is used to specify how many malformed rows are acceptable in the
"original" data source.
>> > But while I was trying to implement that, I realized that I don't
>> > understand v4 of this patch. My misunderstanding is about
>> > `t_on_error_null` tests. We are allowed to insert a NULL value for the
>> > first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why
>> > do we do that? My thought is we should try to execute
>> > InputFunctionCallSafe with NULL value (i mean, here [1]) for the
>> > column after we failed to insert the input value. And, if this second
>> > call is successful, we do replacement, otherwise we count the row as
>> > erroneous.
>>
>> Your concern is valid. Allowing NULL to be stored in a column with a
>> NOT NULL
>> constraint via COPY ON_ERROR=SET_TO_NULL does seem unexpected. As you
>> suggested,
>> NULL values set by SET_TO_NULL should probably be re-evaluated.
>
> Thank you. I updated the patch with a NULL re-evaluation.
>
>
> PFA v7. I did not yet update the doc for this patch version, waiting
> for feedback about REJECT LIMIT + SET_TO_NULL behaviour.
>
There were warnings when I applied the patch:
$ git apply
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:170:
trailing whitespace.
/*
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:181:
trailing whitespace.
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:189:
trailing whitespace.
/* If datatype if okay with NULL,
replace
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:196:
trailing whitespace.
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:212:
trailing whitespace.
/*
> @@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState
> *pstate, bool is_from)
> parser_errposition(pstate, def->location)));
...
>
> - if (opts_out->reject_limit && !opts_out->on_error)
> + if (opts_out->reject_limit && !(opts_out->on_error ==
> COPY_ON_ERROR_NULL || opts_out->on_error == COPY_ON_ERROR_IGNORE))
Hmm, is this change necessary?
Personally, I feel the previous code is easier to read.
"REJECT LIMIT" should be "REJECT_LIMIT"?
> 1037 errhint("Consider specifying the
> REJECT LIMIT option to skip erroneous rows.")));
SET_TO_NULL is one of the value for ON_ERROR, but the patch treats
SET_TO_NULL as option for COPY:
221 --- a/src/bin/psql/tab-complete.in.c
222 +++ b/src/bin/psql/tab-complete.in.c
223 @@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id,
224 COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
225 "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
226 "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
"DEFAULT",
227 - "ON_ERROR", "LOG_VERBOSITY");
228 + "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
> Best regards,
> Kirill Reshke
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
pgsql-hackers by date: