Home > mailing lists

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers

From	Alena Rybakina
Subject	Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date	December 8, 2023 07:09:07
Msg-id	23a89ba0-b5d5-419c-b798-cd2c59490148@yandex.ru Whole thread Raw
In response to	Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) (jian he <jian.universality@gmail.com>)
Responses	Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
List	pgsql-hackers

Tree view

Thank you for your work. Unfortunately, your code contained errors during the make installation:

'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced
'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced
make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1
make[1]: *** [Makefile:131: parser/gram.h] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [src/Makefile.global:383: submake-generated-headers] Error 2

I have ubuntu 22.04 operation system.

On 06.12.2023 13:47, jian he wrote:

On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:

Hi!

Thank you for your contribution to this thread.


I reviewed it and have a few questions.

1. I have seen that you delete a table before creating it, to which you want to add errors due to a failed "copy from" operation. I think this is wrong because this table can save useful data for the user.
At a minimum, we should warn the user about this, but I think we can just add some number at the end of the name, such as name_table1, name_table_2.

Sorry. I don't understand this part.
Currently, if the error table name already exists, then the copy will
fail, an error will be reported.
I try to first create a table, if no error then the error table will be dropped.

To be honest, first of all, I misunderstood this part of the code. Now I see that it works the way you mentioned.

However, I didn't see if you dealt with cases where we already had a table with the same name as the table error.
I mean, when is he trying to create for the first time, or will we never be able to face such a problem?

Can you demo the expected behavior?

Unfortunately, I was unable to launch it due to a build issue.

2. I noticed that you are forming a table name using the type of errors that prevent rows from being added during 'copy from' operation.
I think it would be better to use the name of the source file that was used while 'copy from' was running.
In addition, there may be several such files, it is also worth considering.

Another column added.
now it looks like:

SELECT * FROM save_error_csv_error; filename | lineno |                        line | field | source |                 err_message                 |
err_detail | errorcode
----------+--------+----------------------------------------------------+-------+--------+---------------------------------------------+------------+----------- STDIN    |      1 | 2002    232     40      50      60      70
80 | NULL  | NULL   | extra data after last expected column       |
NULL       | 22P04 STDIN    |      1 | 2000    230     23 | d     | NULL   | missing data for column "d"                 | NULL      | 22P04 STDIN    |      1 | z,,"" | a     | z      | invalid input syntax for type integer: "z"  | NULL      | 22P02 STDIN    |      2 | \0,, | a     | \0     | invalid input syntax for type integer: "\0" | NULL      | 22P02

Yes, I see the "filename" column, and this will solve the problem, but "STDIN" is unclear to me.

3. I found spelling:

/* no err_nsp.error_rel table then crete one. for holding error. */

fixed.

4. Maybe rewrite this comment

these info need, no error will drop err_nsp.error_rel table
to:
this information is necessary, no error will lead to the deletion of the err_sp.error_rel table.

fixed.

Thank you.

5. Is this part of the comment needed? I think it duplicates the information below when we form the query.
 * . column list(order by attnum, begin from ctid) = *    {ctid, lineno,line,field,source,err_message,err_detail,errorcode} * . data types (from attnum = -1) ={tid, int8,text,text,text,text,text,text}

I'm not sure if we need to order the rows by number. It might be easier to work with these lines in the order they appear.

Simplified the comment. "order by attnum" is to make sure that if
there is a table already existing, and the column name is like X and
the data type like Y, then we consider this table is good for holding
potential error info.

COPY FROM, main entry point is NextCopyFrom.
Now for non-binary mode, if you specified save_error then it will not
fail at NextCopyFrom.
all these three errors will be tolerated: extra data after last
expected column, missing data for column, data type conversion.

It looks clearer and better, thanks!

Comments in the format of questions are unusual for me, I perceive them to think about it, for example, as here (contrib/bloom/blinsert.c:312):

/* * Didn't find place to insert in notFullPage array. Allocate new page.
* (XXX is it good to do this while holding ex-lock on the metapage??)
*/

Maybe we can rewrite it like this:/* Check, the err_nsp.error_rel table has already existed
* and if it is, check its column name and data types.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

From: Michael Paquier
Date: 08 December 2023, 07:02:55
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations

From: Alexander Pyhalov
Date: 08 December 2023, 07:13:36
Subject: Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features) - Mailing list pgsql-hackers

Previous

Next