Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers

From Matheus Alcantara
Subject Re: postgres_fdw: Use COPY to speed up batch inserts
Date
Msg-id DEIX9LJ0S9KG.19VG2P7S0I793@gmail.com
Whole thread Raw
In response to Re: postgres_fdw: Use COPY to speed up batch inserts  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Wed Nov 19, 2025 at 8:32 PM -03, Masahiko Sawada wrote:
> I think one key point in the patch is whether or not it's okay to
> switch using COPY based on the actual number of tuples inserted. While
> it should be okay from the performance perspective, it might be an
> issue that the remote query shown in EXPLAIN (without ANALYZE) might
> be different from the actual query sent. If there is a way to
> distinguish the batch insertion between INSERT and COPY in
> postgres_fdw, it might be a good idea to use COPY command for the
> remote query only when the COPY FROM comes.
>
Yeah, I agree that this EXPLAIN inconsistency is an issue that for now
it doesn't seems easy to fix. That being said I've take a step back and
tried to reduce the scope of the patch to implement this idea of using
COPY as a remote sql when the user is executing a COPY FROM on a foreign
table.

My initial idea was to use the COPY as a remote SQL whenever an user
execute a COPY FROM on a foreign table but this can cause breaking
changes because the table on the foreign server may have triggers for
INSERT's and changing to use only the COPY FROM as remote sql would
break these cases. We have some test cases for this scenario.

So on this new version I introduced two new foreign table and server
options:
    - use_copy_for_batch_insert: Enable the usage of COPY when
      appropriate
    - copy_for_batch_insert_threshold: The number of rows necessary to
      switch to use the COPY command instead of an INSERT.

I think that the threshold option is necessary because it can be
configured for a different value than batch_size option based on the
user needs. The default value is 1, so once "use_copy_for_batch_insert"
is set to true, the COPY will start to be used. Note that this option is
set to false by default.

Speeking about the implementation, the CopyFrom() calls
BeginForeignInsert() fdw routine. The postgres_fdw implementation of
this routine create the PgFdwModifyState that is used by
execute_foreign_modify() so I thought that it could be a good idea to
get the table options of COPY usage on this function and save it on
PgFdwModifyState struct and when execute_foreign_modify() is executed we
can just access the table options previously stored and check if the
COPY can be actually be used based on the number of tuples being batch
inserted.

The BeginForeignInsert() routine is also called when inserting tuples
into table partitions, so saving the COPY usage options on this stage
can make it possible to use the COPY command to speed up batch inserts
into partition tables that are postgres_fdw tables. I'm not sure if we
should keep the patch scope only for COPY FROM on foreign table but I
don't see any issue of using the COPY to speed up batch inserts of
postgres_fdw table partitions too since we don't expose the remote sql
being used on this case, and benchmarks shows that we can have a good
performance improvement.

I've implemented this idea on the attached v7 and here it is some
benchmarks that I've run.

Scenario: COPY FROM <fdw_table>
use_copy_for_batch_insert = false
rows being inserted = 100
batch_size = 100
copy_for_batch_insert_threshold = 50
    tps = 6500.133253

Scenario: COPY FROM <fdw_table>
use_copy_for_batch_insert = true
rows being inserted = 100
batch_size = 100
copy_for_batch_insert_threshold = 50
    tps = 13116.474292

Scenario: COPY FROM <fdw_table>
use_copy_for_batch_insert = false
rows being inserted = 140
batch_size = 100
copy_for_batch_insert_threshold = 50
    tps = 4654.865032

Scenario: COPY FROM <fdw_table>
use_copy_for_batch_insert = true
rows being inserted = 140
batch_size = 100
copy_for_batch_insert_threshold = 50
    tps = 7441.694325

-------------------------------------

Scenario: INSERT INTO <partitioned_table>
use_copy_for_batch_insert = false
rows being inserted per partition = 100
number of partitions: 3
    tps = 3176.872369

Scenario: INSERT INTO <partitioned_table>
use_copy_for_batch_insert = true
rows being inserted per partition = 100
number of partitions: 3
    tps = 6993.544958

-------------------------------------

Note that for the "copy_for_batch_insert_threshold = 50" and "rows being
inserted=140" the behaviour is to use the COPY for the first batch
iteration of 100 rows and then fallback to use INSERT for the 40 rows
remaining.

Summary of v7 changes:
    - Introduce "use_copy_for_batch_insert" foreign server/table option
      to enable the usage of COPY command
    - Introduce "copy_for_batch_insert_threshold" option to use the COPY
      command if the number of rows being inserted is >= of the
      configured value. Default is 1.
    - COPY command can only be used if the user is executing a COPY FROM
      on a postgres_fdw table or an INSERT into a partitioned table that
      has postgres_fdw as table partitions.
    - COPY and INSERT can be used for the same execution if there is no
      sufficient rows remaining (based on copy_usage_threshold) after
      the first batch execution.

--
Matheus Alcantara
EDB: http://www.enterprisedb.com


Attachment

pgsql-hackers by date:

Previous
From: Myles Lewis
Date:
Subject: Re: [PATCH] Add native PIVOT syntax for SQL Server/Oracle compatibility
Next
From: Heikki Linnakangas
Date:
Subject: Re: IPC/MultixactCreation on the Standby server