Home > mailing lists

Re: foreign table batch inserts - Mailing list pgsql-hackers

From	Craig Ringer
Subject	Re: foreign table batch inserts
Date	May 20, 2016 15:18:26
Msg-id	CAMsr+YFgDUiJ37DEfPRk8WDBuZ58psdAYJd8iNFSaGxtw=wU3g@mail.gmail.com Whole thread Raw
In response to	Re: foreign table batch inserts (Craig Ringer <craig@2ndquadrant.com>)
Responses	Re: foreign table batch inserts
List	pgsql-hackers

Tree view

On 20 May 2016 at 15:35, Craig Ringer <craig@2ndquadrant.com> wrote:

You can, however, omit Sync from between messages and send a series of protocol messages, like

Parse/Bind/Execute/Bind/Execute/Bind/Execute/Sync

to avoid round-trip overheads.

I implemented what I think is a pretty solid proof of concept of this for kicks this evening. Attached, including basic test program. Patch attached. The performance difference over higher latency links is huge, see below.

Demo/test program in src/test/examples/testlibpqbatch.c.

github: https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch

I still need to add the logic for handling an error during a batch by discarding all input until the next Sync, but otherwise I think it's pretty reasonable.

The time difference for 10k inserts on the local host over a unix socket shows a solid improvement:

batch insert elapsed: 0.244293s

sequential insert elapsed: 0.375402s

... but over, say, a connection to a random AWS RDS instance fired up for the purpose that lives about 320ms away the difference is huge:

batch insert elapsed: 9.029995s

sequential insert elapsed: (I got bored after 10 minutes; it should take a bit less then an hour based on the latency numbers)

With 500 rows on the remote AWS RDS instance, once the I/O quota is already saturated:

batch insert elapsed: 1.229024s

sequential insert elapsed: 156.962180s

which is an improvement by a factor of over 120

I didn't compare vs COPY. I'm sure COPY will be faster, but COPY doesn't let you do INSERT ... ON CONFLICT, do UPDATE, do DELETE, etc. Not without temp tables and a bunch of hoop jumping anyway. If COPY solved everything there'd be no point having pipelining.

No docs yet, but if folks think the interface is reasonable I can add them easily since the comments on each of the new functoins should be easy to adapt into the SGML docs.

With a bit of polishing I think this can probably go in the next CF, though I only wrote it as an experiment. Can I get opinions on the API?

The TL;DR API, using the usual async libpq routines, is:

PQbeginBatchMode(conn);

PQsendQueryParams(conn, "BEGIN", 0, NULL, NULL, NULL, NULL, 0);

PQsendPrepare(conn, "my_update", "UPDATE ...");

PQsetnonblocking(conn, 1);

while (!all_responses_received)

{

select(...)

if (can-write)

{

if (app-has-more-data-to-send)

{

PQsendQueryPrepared(conn, "my_update", params-go-here);

}

else if (havent-sent-commit-yet)

{

PQsendQueryParams(conn, "COMMIT", ...);

}

else if (havent-sent-endbatch-yet)

{

PqEndBatch(conn);

}

PQflush(conn);

}

if (can-read)

{

PQconsumeInput(conn);

if (PQisBusy(conn))

continue;

res = PQgetResult(conn);

if (res == NULL)

{

PQgetNextQuery(conn);

continue;

}

/* process results in the same order we sent the commands */

/* client keeps track of that, libpq just supplies the results */

...

}

PQendBatch(conn);

Note that:

* PQsendQuery cannot be used as it uses simple query protocol, use PQsendQueryParams instead;

* Batch supports PQsendQueryParams, PQsendPrepare, PQsendQueryPrepared, PQsendDescribePrepared, PQsendDescribePortal;

* You don't call PQgetResult after dispatching each query

* Multiple batches may be pipelined, you don't have to wait for one to end to start another (an advantage over JDBC's API)

* non-blocking mode isn't required, but is strongly advised

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-Draft-of-libpq-async-pipelining-support.patch

pgsql-hackers by date:

From: Ronan Dunklau
Date: 20 May 2016, 13:41:47
Subject: Possible regression regarding estimating relation width in FDWs

From: Ildar Musin
Date: 20 May 2016, 15:29:15
Subject: Re: Declarative partitioning

Re: foreign table batch inserts - Mailing list pgsql-hackers

Attachment

Previous

Next