Re: RFC: Async query processing - Mailing list pgsql-hackers
From | Claudio Freire |
---|---|
Subject | Re: RFC: Async query processing |
Date | |
Msg-id | CAGTBQpahBeCjUnrA=vj_8w=fGSKejGcUMitArZa+uuFKJ3LHOA@mail.gmail.com Whole thread Raw |
In response to | Re: RFC: Async query processing (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: RFC: Async query processing
Re: RFC: Async query processing Re: RFC: Async query processing Re: RFC: Async query processing |
List | pgsql-hackers |
On Fri, Jan 3, 2014 at 12:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Claudio Freire <klaussfreire@gmail.com> writes: >> On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer <fweimer@redhat.com> wrote: >>> Loading data into the database isn't such an uncommon task. Not everything >>> is OLTP. > >> Truly, but a sustained insert stream of 10 Mbps is certainly way >> beyond common non-OLTP loads. This is far more specific than non-OLTP. > > I think Florian has a good point there, and the reason is this: what > you are talking about will be of exactly zero use to applications that > want to see the results of one query before launching the next. Which > eliminates a whole lot of apps. I suspect that almost the *only* > common use case in which a stream of queries can be launched without > feedback is going to be bulk data loading. It's not clear at all > that pipelining the PQexec code path is the way to better performance > for that --- why not use COPY, instead? You're forgetting ORM workloads. ORMs can usually plan the inserts to be in a sequence that both don't require feedback (except the knowledge that they were successful), and that do not violate constraints. Flushing a whole object hierarchy for instance, can be done without feedback. Not even serial columns need feedback, since many ORMs (SQLAlchemy, Hibernate) support allocation of ID sequences in batches (by issuing a proper select nextval). I agree, that with the proposed API, it's too error prone to be useful. But I also think, if the API is simple and fool-proof enough, it could be "build them and they will come". I know I'll be happy to implement support for SQLAlchemy (since it will benefit me), if the API resembles the proposition below (at least in simplicity). Per-query expectations could be such a thing. And it can even work with PQexec: PQexec(con, "SELECT nextval('a_id_seq') FROM generate_series(1,10);"); --read-- PQexec(con, "SELECT nextval('b_id_seq') FROM generate_series(1,10);"); --read-- PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); ... 9 times... PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC); do { // do something useful } while (PQflush()); Here, the PQASYNC flag would temporarily switch to non-blocking I/O, and buffer what cannot be sent. PQASNC_CORK, would only buffer (only send if the buffer is full). After any ASYNC call, PQflush would be necessary (to flush the send queue and to consume the expected responses), but I can imagine any synchronous call (PQexec, PQsendQuery or whatever) could detect a non-empty buffer and just blockingly flush right there. This can benefit many useful patterns. ORM flush, is one, if there can be preallocation of IDs (which I know at least SQLAlchemy and Hibernate both support). Execute-many of prepared statements is another one, quite common. I'm not sure what would happen if one of the queries returned an error. If in a transaction, all the following queries would error out I'd imagine. If not, they would simply be executed blindly.. am I correct?
pgsql-hackers by date: