Thread: Add on_error and log_verbosity options to file_fdw

Add on_error and log_verbosity options to file_fdw

From
torikoshia
Date:
Hi,

With the current file_fdw, if even one line of data conversion fails, 
the contents of the file cannot be referenced at all:

   =# \! cat data/test.data
   1,a
   2,b
   a,c
   =# create foreign table f_fdw_test_1 (i int, t text) server f_fdw 
options (filename 'test.data', format 'csv');
   CREATE FOREIGN TABLE

   =# table f_fdw_test_1;
   ERROR:  invalid input syntax for type integer: "a"
   CONTEXT:  COPY f_fdw_test, line 3, column i: "a"

Since we'll support ON_ERROR option which tolerates data conversion 
errors in COPY FROM and LOG_VERBOSITY option at v17[1], how about 
supporting them on file_fdw?

This idea comes from Fujii-san[2], and I think it'd be useful when 
reading a bit dirty data.

Attached PoC patch works like below:

   =# create foreign table f_fdw_test_2 (i int, t text) server f_fdw 
options (filename 'test.data', format 'csv', on_error 'ignore');
   CREATE FOREIGN TABLE

   =# table f_fdw_test_2;
   NOTICE:  1 row was skipped due to data type incompatibility
    i | t
   ---+---
    1 | a
    2 | b
   (2 rows)


   =# create foreign table f_fdw_test_3 (i int, t text) server f_fdw 
options (filename 'test.data', format 'csv', on_error 'ignore', 
log_verbosity 'verbose');
CREATE FOREIGN TABLE

   =# table f_fdw_test_3 ;
   NOTICE:  skipping row due to data type incompatibility at line 3 for 
column i: "a"
   NOTICE:  1 row was skipped due to data type incompatibility
    i | t
   ---+---
    1 | a
    2 | b
   (2 rows)


I'm going to continue developing the patch(e.g. add doc, measure 
performance degradation) when people also think this feature is worth 
adding.


What do you think?


[1] https://www.postgresql.org/docs/devel/sql-copy.html
[2] https://x.com/fujii_masao/status/1808178032219509041

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachment

Re: Add on_error and log_verbosity options to file_fdw

From
torikoshia
Date:
On 2024-07-05 00:27, torikoshia wrote:
> Hi,
> 
> With the current file_fdw, if even one line of data conversion fails,
> the contents of the file cannot be referenced at all:
> 
>   =# \! cat data/test.data
>   1,a
>   2,b
>   a,c
>   =# create foreign table f_fdw_test_1 (i int, t text) server f_fdw
> options (filename 'test.data', format 'csv');
>   CREATE FOREIGN TABLE
> 
>   =# table f_fdw_test_1;
>   ERROR:  invalid input syntax for type integer: "a"
>   CONTEXT:  COPY f_fdw_test, line 3, column i: "a"
> 
> Since we'll support ON_ERROR option which tolerates data conversion
> errors in COPY FROM and LOG_VERBOSITY option at v17[1], how about
> supporting them on file_fdw?
> 
> This idea comes from Fujii-san[2], and I think it'd be useful when
> reading a bit dirty data.
> 
> Attached PoC patch works like below:
> 
>   =# create foreign table f_fdw_test_2 (i int, t text) server f_fdw
> options (filename 'test.data', format 'csv', on_error 'ignore');
>   CREATE FOREIGN TABLE
> 
>   =# table f_fdw_test_2;
>   NOTICE:  1 row was skipped due to data type incompatibility
>    i | t
>   ---+---
>    1 | a
>    2 | b
>   (2 rows)
> 
> 
>   =# create foreign table f_fdw_test_3 (i int, t text) server f_fdw
> options (filename 'test.data', format 'csv', on_error 'ignore',
> log_verbosity 'verbose');
> CREATE FOREIGN TABLE
> 
>   =# table f_fdw_test_3 ;
>   NOTICE:  skipping row due to data type incompatibility at line 3 for
> column i: "a"
>   NOTICE:  1 row was skipped due to data type incompatibility
>    i | t
>   ---+---
>    1 | a
>    2 | b
>   (2 rows)
> 
> 
> I'm going to continue developing the patch(e.g. add doc, measure
> performance degradation) when people also think this feature is worth
> adding.
> 
> 
> What do you think?
> 
> 
> [1] https://www.postgresql.org/docs/devel/sql-copy.html
> [2] https://x.com/fujii_masao/status/1808178032219509041

Update the patch since v1 patch caused compiler warning.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachment

Re: Add on_error and log_verbosity options to file_fdw

From
Masahiko Sawada
Date:
Hi,

On Thu, Jul 18, 2024 at 6:38 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2024-07-05 00:27, torikoshia wrote:
> > Hi,
> >
> > With the current file_fdw, if even one line of data conversion fails,
> > the contents of the file cannot be referenced at all:
> >
> >   =# \! cat data/test.data
> >   1,a
> >   2,b
> >   a,c
> >   =# create foreign table f_fdw_test_1 (i int, t text) server f_fdw
> > options (filename 'test.data', format 'csv');
> >   CREATE FOREIGN TABLE
> >
> >   =# table f_fdw_test_1;
> >   ERROR:  invalid input syntax for type integer: "a"
> >   CONTEXT:  COPY f_fdw_test, line 3, column i: "a"
> >
> > Since we'll support ON_ERROR option which tolerates data conversion
> > errors in COPY FROM and LOG_VERBOSITY option at v17[1], how about
> > supporting them on file_fdw?

+1

> >
> > This idea comes from Fujii-san[2], and I think it'd be useful when
> > reading a bit dirty data.
> >
> > Attached PoC patch works like below:
> >
> >   =# create foreign table f_fdw_test_2 (i int, t text) server f_fdw
> > options (filename 'test.data', format 'csv', on_error 'ignore');
> >   CREATE FOREIGN TABLE
> >
> >   =# table f_fdw_test_2;
> >   NOTICE:  1 row was skipped due to data type incompatibility
> >    i | t
> >   ---+---
> >    1 | a
> >    2 | b
> >   (2 rows)

I'm slightly concerned that users might not want to see the NOTICE
message for every scan. Unlike COPY FROM, scanning a file via file_fdw
could be frequent. An alternative idea of place to write the
information of the number of malformed rows would be the EXPLAIN
command as follow:

                           QUERY PLAN
----------------------------------------------------------------
 Foreign Scan on public.test  (cost=0.00..1.10 rows=1 width=12)
   Output: a, b, c
   Foreign File: test.csv
   Foreign File Size: 12 b
   Skipped Rows: 10

> >
> >
> >   =# create foreign table f_fdw_test_3 (i int, t text) server f_fdw
> > options (filename 'test.data', format 'csv', on_error 'ignore',
> > log_verbosity 'verbose');
> > CREATE FOREIGN TABLE
> >
> >   =# table f_fdw_test_3 ;
> >   NOTICE:  skipping row due to data type incompatibility at line 3 for
> > column i: "a"
> >   NOTICE:  1 row was skipped due to data type incompatibility
> >    i | t
> >   ---+---
> >    1 | a
> >    2 | b
> >   (2 rows)

IIUC we have to execute ALTER FOREIGN TABLE to change the
log_verbosity value and which requires to be the owner. Which seems
not to be user-friendly.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add on_error and log_verbosity options to file_fdw

From
Michael Paquier
Date:
On Mon, Jul 22, 2024 at 03:07:46PM -0700, Masahiko Sawada wrote:
> I'm slightly concerned that users might not want to see the NOTICE
> message for every scan. Unlike COPY FROM, scanning a file via file_fdw
> could be frequent. An alternative idea of place to write the
> information of the number of malformed rows would be the EXPLAIN
> command as follow:

Yeah, I also have some concerns regarding the noise that this could
produce if called on a foreign table on a regular basis.  The verbose
mode is disabled by default so I don't see why we should not allow it
if the relation owner wants to show it.

Perhaps we should first do a silence mode for log_verbosity to skip
the NOTICE produced at the end of the COPY FROM summarizing the whole?
It would be confusing to have different defaults between COPY and
file_fdw, but having the option to silence that completely is also
appealing from the user point of view.

>                            QUERY PLAN
> ----------------------------------------------------------------
>  Foreign Scan on public.test  (cost=0.00..1.10 rows=1 width=12)
>    Output: a, b, c
>    Foreign File: test.csv
>    Foreign File Size: 12 b
>    Skipped Rows: 10

Interesting idea linked to the idea of pushing the error state to
something else than the logs.  Sounds like a separate feature.

> IIUC we have to execute ALTER FOREIGN TABLE to change the
> log_verbosity value and which requires to be the owner. Which seems
> not to be user-friendly.

I am not sure about allowing scans to force an option to be a
different thing at runtime vs what's been defined in the relation
itself with CREATE/ALTER.
--
Michael

Attachment

Re: Add on_error and log_verbosity options to file_fdw

From
torikoshia
Date:
On 2024-07-23 08:57, Michael Paquier wrote:
> On Mon, Jul 22, 2024 at 03:07:46PM -0700, Masahiko Sawada wrote:
>> I'm slightly concerned that users might not want to see the NOTICE
>> message for every scan. Unlike COPY FROM, scanning a file via file_fdw
>> could be frequent.

Agreed.

> Yeah, I also have some concerns regarding the noise that this could
> produce if called on a foreign table on a regular basis.  The verbose
> mode is disabled by default so I don't see why we should not allow it
> if the relation owner wants to show it.
> 
> Perhaps we should first do a silence mode for log_verbosity to skip
> the NOTICE produced at the end of the COPY FROM summarizing the whole?

I like this idea.
If there are no objections, I'm going to make a patch for this.

> It would be confusing to have different defaults between COPY and
> file_fdw, but having the option to silence that completely is also
> appealing from the user point of view.

I'm not sure we should change the defaults.
If the default of file_fdw is silence mode, I am a little concerned that 
there may be cases where people think they have no errors, but in fact 
they have.

>>                            QUERY PLAN
>> ----------------------------------------------------------------
>>  Foreign Scan on public.test  (cost=0.00..1.10 rows=1 width=12)
>>    Output: a, b, c
>>    Foreign File: test.csv
>>    Foreign File Size: 12 b
>>    Skipped Rows: 10
> 
> Interesting idea linked to the idea of pushing the error state to
> something else than the logs.  Sounds like a separate feature.

+1


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



Re: Add on_error and log_verbosity options to file_fdw

From
torikoshia
Date:
On 2024-07-24 19:43, torikoshia wrote:
> On 2024-07-23 08:57, Michael Paquier wrote:
>> On Mon, Jul 22, 2024 at 03:07:46PM -0700, Masahiko Sawada wrote:
>>> I'm slightly concerned that users might not want to see the NOTICE
>>> message for every scan. Unlike COPY FROM, scanning a file via 
>>> file_fdw
>>> could be frequent.
> 
> Agreed.
> 
>> Yeah, I also have some concerns regarding the noise that this could
>> produce if called on a foreign table on a regular basis.  The verbose
>> mode is disabled by default so I don't see why we should not allow it
>> if the relation owner wants to show it.
>> 
>> Perhaps we should first do a silence mode for log_verbosity to skip
>> the NOTICE produced at the end of the COPY FROM summarizing the whole?
> 
> I like this idea.
> If there are no objections, I'm going to make a patch for this.

Attached patches.
0001 adds new option 'silent' to log_verbosity and 0002 adds on_error 
and log_verbosity options to file_fdw.


> I'm going to continue developing the patch(e.g. add doc, measure
performance degradation) when people also think this feature is worth 
adding.

Here is a quick performance test result.

I loaded 1,000,000 rows using pgbench_accounts to a file and counted the 
number of rows using file_fdw on different conditions and compared the 
execution times on my laptop.

The changed conditions are below:
- source code: HEAD/patch applied
- data: no error data/all row occur data conversion error at the 1st 
column
- file_fdw options: on_error=stop/on_error=ignore

There seems no significant difference in performance between HEAD and 
the patch applied with on_error option specified as either ignore/stop 
when data has no error.
OTOH when all rows occur data conversion error, it is significantly 
faster than other cases:

# HAED(e950fe58bd0)
## data:no error

=# create foreign table t1 (a int, b int, c int, t text) server f_fdw 
options (filename 'pgb_ac', format 'csv');
=# select count(*) from t1;

1567.569 ms
1675.112 ms
1555.782 ms
1547.676 ms
1660.221 ms

# patch applied
## data:no error, on_error:stop

=# create foreign table t1 (a int, b int, c int, t text) server f_fdw 
options (filename 'pgb_ac', format 'csv', on_error 'stop');
=# select count(*) from t1;

1580.656 ms
1623.784 ms
1596.947 ms
1652.307 ms
1613.607 ms

## data:no error, on_error:ignore

=# create foreign table t1 (a int, b int, c int, t text) server f_fdw 
options (filename 'pgb_ac', format 'csv', on_error 'ignore');
=# select count(*) from t1;

1575.718 ms
1597.464 ms
1596.540 ms
1665.818 ms
1595.453 ms

#### data:all rows contain error, on_error:ignore

=# create foreign table t1 (a int, b int, c int, t text) server f_fdw 
options (filename 'pgb_ac', format 'csv', on_error 'ignore');
=# select count(*) from t1;

914.537 ms
907.506 ms
912.768 ms
913.769 ms
914.327 ms


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation
Attachment

Re: Add on_error and log_verbosity options to file_fdw

From
Fujii Masao
Date:

On 2024/08/08 16:36, torikoshia wrote:
> Attached patches.
> 0001 adds new option 'silent' to log_verbosity and 0002 adds on_error and log_verbosity options to file_fdw.

Thanks for the patches!

Here are the review comments for 0001 patch.

+      <literal>silent</literal> excludes verbose messages.

This should clarify that in silent mode, not only verbose messages but also
default ones are suppressed?

+        cstate->opts.log_verbosity != COPY_LOG_VERBOSITY_SILENT)

I think using "cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT" instead
might improve readability.

-    COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
-    COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
+    COPY_LOG_VERBOSITY_SILENT = -1,    /* logs none */
+    COPY_LOG_VERBOSITY_DEFAULT = 0,    /* logs no additional messages, default */
+    COPY_LOG_VERBOSITY_VERBOSE,    /* logs additional messages */

Why do we need to assign specific numbers like -1 or 0 in this enum definition?


Here are the review comments for 0002 patch.

+            pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+                                         ++skipped);

The skipped tuple count isn't accurate because fileIterateForeignScan() resets
"skipped" to 0.

+        if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+            cstate->escontext->error_occurred)
+        {
+            /*
+             * Soft error occurred, skip this tuple and deal with error
+             * information according to ON_ERROR.
+             */
+            if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)

If COPY_ON_ERROR_IGNORE indicates tuple skipping, shouldn’t we not only reset
error_occurred but also call "pgstat_progress_update_param" and continue within
this block?

+    for(;;)
+    {

Using "goto" here might improve readability instead of using a "for" loop.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Add on_error and log_verbosity options to file_fdw

From
Fujii Masao
Date:

On 2024/09/19 23:16, torikoshia wrote:
>> -       COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
>> -       COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
>> +       COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */
>> +       COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
>> +       COPY_LOG_VERBOSITY_VERBOSE,     /* logs additional messages */
>>
>> Why do we need to assign specific numbers like -1 or 0 in this enum definition?
> 
> CopyFormatOptions is initialized by palloc0() at the beginning of ProcessCopyOptions().
> The reason to assign specific numbers here is to assign COPY_LOG_VERBOSITY_DEFAULT to 0 as default value and sort
elementsof enum according to the amount of logging.
 

Understood.


> BTW CopyFrom() also uses local variable skipped. It isn't reset like file_fdw, but using local variable seems not
necessarysince we can use cstate->num_errors here as well.
 

Sounds reasonable to me.


>> +               if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
>> +                       cstate->escontext->error_occurred)
>> +               {
>> +                       /*
>> +                        * Soft error occurred, skip this tuple and deal with error
>> +                        * information according to ON_ERROR.
>> +                        */
>> +                       if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
>>
>> If COPY_ON_ERROR_IGNORE indicates tuple skipping, shouldn’t we not only reset
>> error_occurred but also call "pgstat_progress_update_param" and continue within
>> this block?
> 
> I may misunderstand your comment, but I thought it to behave as you expect in the below codes:

The "on_error == COPY_ON_ERROR_IGNORE" condition isn't needed since
"on_error != COPY_ON_ERROR_STOP" is already checked, and on_error accepts
only two values "ignore" and "stop." I assume you added it with
a future option in mind, like "set_to_null" (as discussed in another thread).
However, I’m not sure how much this helps such future changes.
So, how about simplifying the code by replacing "on_error != COPY_ON_ERROR_STOP"
with "on_error == COPY_ON_ERROR_IGNORE" at the top and removing
the "on_error == COPY_ON_ERROR_IGNORE" check in the middle?
It could improve readability.


>> +       for(;;)
>> +       {
>> Using "goto" here might improve readability instead of using a "for" loop.
> 
> Hmm, AFAIU we need to return a slot here before the end of file is reached.
> 
> ```
> --src/backend/executor/execMain.c [ExecutePlan()]
>             /*
>              * if the tuple is null, then we assume there is nothing more to
>              * process so we just end the loop...
>              */
>             if (TupIsNull(slot))
>                 break;
> ```
> 
> When ignoring errors, we have to keep calling NextCopyFrom() until we find a non error tuple or EOF and to do so
callingNextCopyFrom() in for loop seems straight forward.
 
> 
> Replacing the "for" loop using "goto" as follows is possible, but seems not so readable because of the upward
"goto":

Understood.


> Attached v4 patches reflected these comments.

Thanks for updating the patches!

The tab-completion needs to be updated to support the "silent" option?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Add on_error and log_verbosity options to file_fdw

From
Fujii Masao
Date:

On 2024/09/24 20:08, torikoshia wrote:
> Thanks for the explanation and suggestion.
> Since there is almost the same code in copyfrom.c, attached 0003 patch for refactoring both.

Thanks for updating the patches!

Regarding 0002.patch, I think it’s better to include the refactored code
from the start rather than adding redundant code intentionally.
How about leaving just the refactor in copyfrom.c to 0003.patch?
If that works, as a refactoring, you could also replace "skipped" with
"cstate->num_errors" in that patch, as you suggested earlier.

While reviewing again, I noticed that running ANALYZE on a file_fdw
foreign table also calls NextCopyFrom(), but it doesn’t seem to
skip erroneous rows when on_error is set to "ignore." This could lead
to inaccurate statistics. Shouldn’t ANALYZE on file_fdw foreign tables
with on_error=ignore also skip erroneous rows?

>> The tab-completion needs to be updated to support the "silent" option?
> 
> Yes, updated 0002 patch.

Thanks! Also, this should be part of 0001.patch since "silent" is
introduced there, right?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Add on_error and log_verbosity options to file_fdw

From
Masahiko Sawada
Date:
Hi,

On Mon, Sep 30, 2024 at 8:36 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2024/09/26 21:57, torikoshia wrote:
> > Updated the patches.
>
> Thanks for updating the patches! I’ve made some changes based on your work, which are attached.
> Barring any objections, I'm thinking to push these patches.
>
> For patches 0001 and 0003, I ran pgindent and updated the commit message.
>
> Regarding patch 0002:
>
> - I updated the regression test to run ANALYZE on the file_fdw foreign table
>    since the on_error option also affects the ANALYZE command. To ensure test
>    stability, the test now runs ANALYZE with log_verbosity = 'silent'.
>
> - I removed the code that updated the count of skipped rows for
>    the pg_stat_progress_copy view. As far as I know, file_fdw doesn’t
>    currently support tracking pg_stat_progress_copy.tuples_processed.
>    Supporting only tuples_skipped seems inconsistent, so I suggest creating
>    a separate patch to extend file_fdw to track both tuples_processed and
>    tuples_skipped in this view.
>
> - I refactored the for-loop handling on_error = 'ignore' in fileIterateForeignScan()
>    by replacing it with a goto statement for improved readability.
>
> - I modified file_fdw to log a NOTICE message about skipped rows at the end of
>    ANALYZE if any rows are skipped due to the on_error = 'ignore' setting.
>
>    Regarding the "file contains XXX rows" message reported by the ANALYZE VERBOSE
>    command on the file_fdw foreign table, what number should be reflected in XXX,
>    especially when some rows are skipped due to on_error = 'ignore'?
>    Currently, the count only includes valid rows, excluding any skipped rows.
>    I haven't modified this code yet. Should we instead count all rows
>    (valid and erroneous) and report that total?
>
>    I noticed the code for reporting the number of skipped rows due to
>    on_error = 'ignore' appears in three places. I’m considering creating
>    a common function for this reporting to eliminate redundancy but haven’t
>    implemented it yet.
>
> - I’ve updated the commit message and run pgindent.

Sorry for being late in joining the review of this patch. Both 0001
and 0003 look good to me. I have two comments on the 0002 patch:

-       found = NextCopyFrom(festate->cstate, econtext,
-                                                slot->tts_values,
slot->tts_isnull);
-       if (found)
+
+retry:
+       if (NextCopyFrom(cstate, econtext, slot->tts_values, slot->tts_isnull))
+       {
+               if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+                       cstate->escontext->error_occurred)
+               {
+                       /*
+                        * Soft error occurred, skip this tuple and just make
+                        * ErrorSaveContext ready for the next
NextCopyFrom. Since we
+                        * don't set details_wanted and error_data is
not to be filled,
+                        * just resetting error_occurred is enough.
+                        */
+                       cstate->escontext->error_occurred = false;
+
+                       /* Repeat NextCopyFrom() until no soft error occurs */
+                       goto retry;
+               }
+
                ExecStoreVirtualTuple(slot);
+       }

I think that while scanning a file_fdw foreign table with
log_verbosity='silent' the query is not interruptible.

Also, we don't switch to the per-tuple memory context when retrying
due to a soft error. I'm not sure it's okay as in CopyFrom(), a
similar function for COPY command, we switch to the per-tuple memory
context every time before parsing an input line. Would it be
problematic if we switch to another memory context while parsing an
input line? In CopyFrom() we also call ResetPerTupleExprContext() and
ExecClearTuple() for every input, so we might want to consider calling
them for every input.


Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add on_error and log_verbosity options to file_fdw

From
Masahiko Sawada
Date:
On Tue, Oct 1, 2024 at 11:34 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2024/10/02 9:27, Masahiko Sawada wrote:
> > Sorry for being late in joining the review of this patch. Both 0001
> > and 0003 look good to me. I have two comments on the 0002 patch:
>
> Thanks for the review!
>
> > I think that while scanning a file_fdw foreign table with
> > log_verbosity='silent' the query is not interruptible.
>
> You're right. I added CHECK_FOR_INTERRUPTS() in the retry loop.
>
> > Also, we don't switch to the per-tuple memory context when retrying
> > due to a soft error. I'm not sure it's okay as in CopyFrom(), a
> > similar function for COPY command, we switch to the per-tuple memory
> > context every time before parsing an input line. Would it be
> > problematic if we switch to another memory context while parsing an
> > input line? In CopyFrom() we also call ResetPerTupleExprContext() and
> > ExecClearTuple() for every input, so we might want to consider calling
> > them for every input.
>
> Yes, I've updated the patch based on your comment.
> Could you please review the latest version?

Thank you for updating the patch! All patches look good to me.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add on_error and log_verbosity options to file_fdw

From
Fujii Masao
Date:

On 2024/10/03 13:23, Masahiko Sawada wrote:
> On Tue, Oct 1, 2024 at 11:34 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2024/10/02 9:27, Masahiko Sawada wrote:
>>> Sorry for being late in joining the review of this patch. Both 0001
>>> and 0003 look good to me. I have two comments on the 0002 patch:
>>
>> Thanks for the review!
>>
>>> I think that while scanning a file_fdw foreign table with
>>> log_verbosity='silent' the query is not interruptible.
>>
>> You're right. I added CHECK_FOR_INTERRUPTS() in the retry loop.
>>
>>> Also, we don't switch to the per-tuple memory context when retrying
>>> due to a soft error. I'm not sure it's okay as in CopyFrom(), a
>>> similar function for COPY command, we switch to the per-tuple memory
>>> context every time before parsing an input line. Would it be
>>> problematic if we switch to another memory context while parsing an
>>> input line? In CopyFrom() we also call ResetPerTupleExprContext() and
>>> ExecClearTuple() for every input, so we might want to consider calling
>>> them for every input.
>>
>> Yes, I've updated the patch based on your comment.
>> Could you please review the latest version?
> 
> Thank you for updating the patch! All patches look good to me.

Thanks for the review! Pushed.

BTW, regarding the issue I mentioned earlier about file_fdw not reporting
the number of tuples processed and skipped in the pg_stat_progress_copy view,
I'll start a new thread to discuss this further.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Add on_error and log_verbosity options to file_fdw

From
torikoshia
Date:
On 2024-10-03 18:03, Fujii Masao wrote:
> On 2024/10/03 13:23, Masahiko Sawada wrote:
>> On Tue, Oct 1, 2024 at 11:34 PM Fujii Masao 
>> <masao.fujii@oss.nttdata.com> wrote:
>>> 
>>> 
>>> On 2024/10/02 9:27, Masahiko Sawada wrote:
>>>> Sorry for being late in joining the review of this patch. Both 0001
>>>> and 0003 look good to me. I have two comments on the 0002 patch:
>>> 
>>> Thanks for the review!
>>> 
>>>> I think that while scanning a file_fdw foreign table with
>>>> log_verbosity='silent' the query is not interruptible.
>>> 
>>> You're right. I added CHECK_FOR_INTERRUPTS() in the retry loop.
>>> 
>>>> Also, we don't switch to the per-tuple memory context when retrying
>>>> due to a soft error. I'm not sure it's okay as in CopyFrom(), a
>>>> similar function for COPY command, we switch to the per-tuple memory
>>>> context every time before parsing an input line. Would it be
>>>> problematic if we switch to another memory context while parsing an
>>>> input line? In CopyFrom() we also call ResetPerTupleExprContext() 
>>>> and
>>>> ExecClearTuple() for every input, so we might want to consider 
>>>> calling
>>>> them for every input.
>>> 
>>> Yes, I've updated the patch based on your comment.
>>> Could you please review the latest version?
>> 
>> Thank you for updating the patch! All patches look good to me.
> 
> Thanks for the review! Pushed.

Fujii-san, Sawada-san,
Thanks for your review and modifications!


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation