Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row - Mailing list pgsql-hackers
From | torikoshia |
---|---|
Subject | Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row |
Date | |
Msg-id | 501dd655ddb04693c15baeb6485bc601@oss.nttdata.com Whole thread Raw |
In response to | Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row ("David G. Johnston" <david.g.johnston@gmail.com>) |
Responses |
Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
|
List | pgsql-hackers |
On 2024-11-09 21:55, Kirill Reshke wrote: Thanks for working on this! > On Thu, 7 Nov 2024 at 23:00, Fujii Masao <masao.fujii@oss.nttdata.com> > wrote: >> >> >> >> On 2024/10/26 6:03, Kirill Reshke wrote: >> > when the REJECT LIMIT is set to some non-zero number and the number of >> > row NULL replacements exceeds the limit, is it OK to fail. Because >> > there WAS errors, and we should not tolerate more than $limit errors . >> > I do find this behavior to be consistent. >> >> +1 >> >> >> > But what if we don't set a REJECT LIMIT, it is sane to do all >> > replacements, as if REJECT LIMIT is inf. >> >> +1 > > After thinking for a while, I'm now more opposed to this approach. I > think we should count rows with erroneous data as errors only if > null substitution for these rows failed, not the total number of rows > which were modified. > Then, to respect the REJECT LIMIT option, we compare this number with > the limit. This is actually simpler approach IMHO. What do You think? IMHO I prefer the previous interpretation. I'm not sure this is what people expect, but I assume that REJECT_LIMIT is used to specify how many malformed rows are acceptable in the "original" data source. >> > But while I was trying to implement that, I realized that I don't >> > understand v4 of this patch. My misunderstanding is about >> > `t_on_error_null` tests. We are allowed to insert a NULL value for the >> > first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why >> > do we do that? My thought is we should try to execute >> > InputFunctionCallSafe with NULL value (i mean, here [1]) for the >> > column after we failed to insert the input value. And, if this second >> > call is successful, we do replacement, otherwise we count the row as >> > erroneous. >> >> Your concern is valid. Allowing NULL to be stored in a column with a >> NOT NULL >> constraint via COPY ON_ERROR=SET_TO_NULL does seem unexpected. As you >> suggested, >> NULL values set by SET_TO_NULL should probably be re-evaluated. > > Thank you. I updated the patch with a NULL re-evaluation. > > > PFA v7. I did not yet update the doc for this patch version, waiting > for feedback about REJECT LIMIT + SET_TO_NULL behaviour. > There were warnings when I applied the patch: $ git apply v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:170: trailing whitespace. /* v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:181: trailing whitespace. v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:189: trailing whitespace. /* If datatype if okay with NULL, replace v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:196: trailing whitespace. v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:212: trailing whitespace. /* > @@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState > *pstate, bool is_from) > parser_errposition(pstate, def->location))); ... > > - if (opts_out->reject_limit && !opts_out->on_error) > + if (opts_out->reject_limit && !(opts_out->on_error == > COPY_ON_ERROR_NULL || opts_out->on_error == COPY_ON_ERROR_IGNORE)) Hmm, is this change necessary? Personally, I feel the previous code is easier to read. "REJECT LIMIT" should be "REJECT_LIMIT"? > 1037 errhint("Consider specifying the > REJECT LIMIT option to skip erroneous rows."))); SET_TO_NULL is one of the value for ON_ERROR, but the patch treats SET_TO_NULL as option for COPY: 221 --- a/src/bin/psql/tab-complete.in.c 222 +++ b/src/bin/psql/tab-complete.in.c 223 @@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id, 224 COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL", 225 "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE", 226 "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT", 227 - "ON_ERROR", "LOG_VERBOSITY"); 228 + "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY"); > Best regards, > Kirill Reshke -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
pgsql-hackers by date: