Re: foreign table batch inserts - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: foreign table batch inserts |
Date | |
Msg-id | CAMsr+YFgDUiJ37DEfPRk8WDBuZ58psdAYJd8iNFSaGxtw=wU3g@mail.gmail.com Whole thread Raw |
In response to | Re: foreign table batch inserts (Craig Ringer <craig@2ndquadrant.com>) |
Responses |
Re: foreign table batch inserts
|
List | pgsql-hackers |
On 20 May 2016 at 15:35, Craig Ringer <craig@2ndquadrant.com> wrote:
You can, however, omit Sync from between messages and send a series of protocol messages, likeParse/Bind/Execute/Bind/Execute/Bind/Execute/Syncto avoid round-trip overheads.
I implemented what I think is a pretty solid proof of concept of this for kicks this evening. Attached, including basic test program. Patch attached. The performance difference over higher latency links is huge, see below.
Demo/test program in src/test/examples/testlibpqbatch.c.
I still need to add the logic for handling an error during a batch by discarding all input until the next Sync, but otherwise I think it's pretty reasonable.
The time difference for 10k inserts on the local host over a unix socket shows a solid improvement:
batch insert elapsed: 0.244293s
sequential insert elapsed: 0.375402s
... but over, say, a connection to a random AWS RDS instance fired up for the purpose that lives about 320ms away the difference is huge:
batch insert elapsed: 9.029995s
sequential insert elapsed: (I got bored after 10 minutes; it should take a bit less then an hour based on the latency numbers)
With 500 rows on the remote AWS RDS instance, once the I/O quota is already saturated:
batch insert elapsed: 1.229024s
sequential insert elapsed: 156.962180s
which is an improvement by a factor of over 120
I didn't compare vs COPY. I'm sure COPY will be faster, but COPY doesn't let you do INSERT ... ON CONFLICT, do UPDATE, do DELETE, etc. Not without temp tables and a bunch of hoop jumping anyway. If COPY solved everything there'd be no point having pipelining.
No docs yet, but if folks think the interface is reasonable I can add them easily since the comments on each of the new functoins should be easy to adapt into the SGML docs.
With a bit of polishing I think this can probably go in the next CF, though I only wrote it as an experiment. Can I get opinions on the API?
The TL;DR API, using the usual async libpq routines, is:
PQbeginBatchMode(conn);
PQsendQueryParams(conn, "BEGIN", 0, NULL, NULL, NULL, NULL, 0);
PQsendPrepare(conn, "my_update", "UPDATE ...");
PQsetnonblocking(conn, 1);
while (!all_responses_received)
{
select(...)
if (can-write)
{
if (app-has-more-data-to-send)
{
PQsendQueryPrepared(conn, "my_update", params-go-here);
}
else if (havent-sent-commit-yet)
{
PQsendQueryParams(conn, "COMMIT", ...);
}
else if (havent-sent-endbatch-yet)
{
PqEndBatch(conn);
}
PQflush(conn);
}
if (can-read)
{
PQconsumeInput(conn);
if (PQisBusy(conn))
continue;
res = PQgetResult(conn);
if (res == NULL)
{
PQgetNextQuery(conn);
continue;
}
/* process results in the same order we sent the commands */
/* client keeps track of that, libpq just supplies the results */
...
}
}
PQendBatch(conn);
Note that:
* PQsendQuery cannot be used as it uses simple query protocol, use PQsendQueryParams instead;
* Batch supports PQsendQueryParams, PQsendPrepare, PQsendQueryPrepared, PQsendDescribePrepared, PQsendDescribePortal;
* You don't call PQgetResult after dispatching each query
* Multiple batches may be pipelined, you don't have to wait for one to end to start another (an advantage over JDBC's API)
* non-blocking mode isn't required, but is strongly advised
Attachment
pgsql-hackers by date: