Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers

From Matheus Alcantara
Subject Re: postgres_fdw: Use COPY to speed up batch inserts
Date
Msg-id DE20J06SWW40.IN0FOXQSFUEA@gmail.com
Whole thread Raw
In response to Re: postgres_fdw: Use COPY to speed up batch inserts  ("Matheus Alcantara" <matheusssilv97@gmail.com>)
List pgsql-hackers
On Fri Oct 31, 2025 at 4:02 PM -03, I wrote:
> It's showing a bit complicated to decide at runtime if we should use the
> COPY or INSERT for batch insert into a foreign table. Perhaps we could
> add a new option on CREATE FOREIGN TABLE to enable this usage or not? We
> could document the performance improvements and the limitations so the
> user can decide if it should enable or not.
>
Here is v5 that implement this idea.

On this version I've introduced a foreign table and foreign server
option "use_copy_for_insert" (I'm open for a better name) that enable
the use of the COPY as remote command to execute an INSERT into a
foreign table. The COPY can be used if the user enable this option on
the foreign table or the foreign server and if the original INSERT
statement don't have a RETURNING clause.

See the benchmark results:

pgbench -n -c 10 -j 10 -t 100 -f bench.sql postgres

Master (batch_size = 1 with a single row to insert):
tps = 16000.768037

Master (batch_size = 1 with 1000 rows to insert):
tps = 133.451518

Master (batch_size = 100 with 1000 rows to insert):
tps = 1274.096347

-----------------

Patch(batch_size = 1, use_copy_for_insert = false with single row to
insert)
tps = 15734.155705

Master (batch_size = 1, use_copy_for_insert = false with 1000 rows to
insert):
tps = 132.644801

Master (batch_size = 100, use_copy_for_insert = false with 1000 rows to
insert):
tps = 1245.514591

-----------------

Patch(batch_size = 1, use_copy_for_insert = true with single row to
insert)
tps = 17604.394057

Master (batch_size = 1, use_copy_for_insert = true with 1000 rows to
insert):
tps = 88.998804

Master (batch_size = 100, use_copy_for_insert = true with 1000 rows to
insert):
tps = 2406.009249

-----------------

We can see that when batching inserting with the batch_size configured
properly we have a very significant performance improvement and when the
"use_copy_for_insert" option is disabled the performance are close
compared with master.

The problem is when the "batch_size" is 1 (default) and
"use_copy_for_insert" is enabled. This is because on this scenario we
are sending multiple COPY commands with a single row to the foreign
server.

One way to fix this would to decide at runtime (at
execute_foreign_modify()) if the COPY can be used based on the number of
rows being insert. I don't think that I like this option because it
would make the EXPLAIN output different when the ANALYZE option is used
since during planning time we don't have the number of rows being
inserted, so if just EXPLAIN(VERBOSE) is executed we would show the
INSERT as remote SQL, and if the ANALYZE is included and we have enough
rows to enable the COPY usage, the remote SQL would show the COPY
command.

Since the new "use_copy_for_insert" option is be disabled by default I
think that we could document this limitation and mention the performance
improvements when used correctly with the batch_size option.

Another option would be to use the COPY command only if the
"use_copy_for_insert" is true and also if the "batch_size" is > 1. We
would still have the performance issue if the user insert a single row
but we would close to less scenarios. The attached 0002 implement this
idea.

Thoughts?

--
Matheus Alcantara
EDB: http://www.enterprisedb.com


Attachment

pgsql-hackers by date:

Previous
From: Robert Treat
Date:
Subject: Re: alter check constraint enforceability
Next
From: Chao Li
Date:
Subject: Re: Suggestion to add --continue-client-on-abort option to pgbench