Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers
From | torikoshia |
---|---|
Subject | Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) |
Date | |
Msg-id | 19551e8c2717c24689913083f841ddb5@oss.nttdata.com Whole thread Raw |
In response to | Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) (torikoshia <torikoshia@oss.nttdata.com>) |
Responses |
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
|
List | pgsql-hackers |
On 2023-03-23 02:50, Andres Freund wrote: Thanks again for your review. Attached v5 patch. > Have you measured whether this has negative performance effects when > *NOT* > using the new option? I loaded 10000000 rows of pgbench_accounts on my laptop and compared the elapsed time. GUCs changed from the default are logging_collector = on, log_error_verbosity = verbose. Three tests were run under each condition and the middle of them is listed below: - patch NOT applied(36f40ce2dc66): 35299ms - patch applied, without IGNORE_DATATYPE_ERRORS: 34409ms - patch applied, with IGNORE_DATATYPE_ERRORS: 35510ms It seems there are no significant degradation. Also tested the elapsed time when loading data which has some datatype error with IGNORE_DATATYPE_ERRORS: - data has 100 rows of error: 35269ms - data has 1000 rows of error: 34577ms - data has 5000000 rows of error: 48925ms 5000000 rows of error consumes much time, but it seems to be influenced by logging time. Here are test results under log_min_messages and client_min_messages are 'error': - data has 5000000 data type error: 23972ms - data has 0 data type error: 34320ms Now conversely, when there were many data type errors, it consumes less time. This seems like a reasonable result since the amount of skipped data is increasing. > As-is this does not work with FORMAT BINARY - and converting the binary > input > functions to support soft errors won't happen for 16. So I think you > need to > raise an error if BINARY and IGNORE_DATATYPE_ERRORS are specified. Added the option check. > On 2023-03-22 22:34:20 +0900, torikoshia wrote: > > @@ -985,9 +986,28 @@ CopyFrom(CopyFromState cstate) > > > > ExecClearTuple(myslot); > > > > + if (cstate->opts.ignore_datatype_errors) > > + { > > + escontext.details_wanted = true; > > + cstate->escontext = escontext; > > + } > > I think it might be worth pulling this out of the loop. That does mean > you'd > have to reset escontext.error_occurred after an error, but that doesn't > seem > too bad, you need to do other cleanup anyway. Pull this out of the loop and added process for resetting escontext. > > @@ -956,10 +957,20 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, > > values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); > > } > > else > > - values[m] = InputFunctionCall(&in_functions[m], > > - string, > > - typioparams[m], > > - att->atttypmod); > > + /* If IGNORE_DATATYPE_ERRORS is enabled skip rows with datatype errors */ > > + if (!InputFunctionCallSafe(&in_functions[m], > > + string, > > + typioparams[m], > > + att->atttypmod, > > + (Node *) &cstate->escontext, > > + &values[m])) > > + { > > + cstate->ignored_errors++; > > + > > + ereport(WARNING, > > + errmsg("%s", cstate->escontext.error_data->message)); > > That isn't right - you loose all the details of the message. As is > you'd also > leak the error context. > > I think the best bet for now is to do something like > /* adjust elevel so we don't jump out */ > cstate->escontext.error_data->elevel = WARNING; > /* despite the name, this won't raise an error if elevel < ERROR */ > ThrowErrorData(cstate->escontext.error_data); As I mentioned in one previous email, added above codes for now. > I wonder if we ought to provide a wrapper for this? It could e.g. know > to > mention the original elevel and such? > > > I don't think NextCopyFrom() is the right place to emit this warning - > it > e.g. is also called from file_fdw.c, which might want to do something > else > with the error. From a layering POV it seems cleaner to do this in > CopyFrom(). You already have a check for escontext.error_occurred there > anyway. Agreed. > > @@ -3378,6 +3378,10 @@ copy_opt_item: > > { > > $$ = makeDefElem("freeze", (Node *) makeBoolean(true), @1); > > } > > + | IGNORE_DATATYPE_ERRORS > > + { > > + $$ = makeDefElem("ignore_datatype_errors", (Node *)makeBoolean(true), @1); > > + } > > | DELIMITER opt_as Sconst > > { > > $$ = makeDefElem("delimiter", (Node *) makeString($3), @1); > > > I think we shouldn't add a new keyword for this, but only support this > via > /* new COPY option syntax */ > copy_generic_opt_list: > copy_generic_opt_elem > > Further increasing the size of the grammar with random keywords when we > have > more generic ways to represent them seems unnecessary. Agreed. > > +-- tests for IGNORE_DATATYPE_ERRORS option > > +CREATE TABLE check_ign_err (n int, m int[], k int); > > +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS; > > +1 {1} 1 > > +a {2} 2 > > +3 {3} 3333333333 > > +4 {a, 4} 4 > > + > > +5 {5} 5 > > +\. > > +SELECT * FROM check_ign_err; > > + > > I suggest adding a few more tests: > > - COPY with a datatype error that can't be handled as a soft error Added a test for cases with missing columns. However it's not datatype error and not what you expected, is it? > - test documenting that COPY FORMAT BINARY is incompatible with > IGNORE_DATATYPE_ERRORS Added it. > - a soft error showing the error context - although that will require > some > care to avoid the function name + line in the output I assume you mean a test to check the server log, but I haven't come up with a way to do it. Adding a TAP test might do it, but I think it would be overkill to add one just for this. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
pgsql-hackers by date: