Re: Pipeline mode and PQpipelineSync() - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: Pipeline mode and PQpipelineSync() |
Date | |
Msg-id | 202106222214.ptjfmstb23mu@alvherre.pgsql Whole thread Raw |
In response to | Re: Pipeline mode and PQpipelineSync() (Boris Kolpackov <boris@codesynthesis.com>) |
Responses |
Re: Pipeline mode and PQpipelineSync()
Re: Pipeline mode and PQpipelineSync() |
List | pgsql-hackers |
On 2021-Jun-21, Boris Kolpackov wrote: > Alvaro Herrera <alvaro.herrera@2ndquadrant.com> writes: > > > I think I should rephrase this to say that PQpipelineSync() is needed > > where the user needs the server to start executing commands; and > > separately indicate that it is possible (but not promised) that the > > server would start executing commands ahead of time because $reasons. > > I think always requiring PQpipelineSync() is fine since it also serves > as an error recovery boundary. But the fact that the server waits until > the sync message to start executing the pipeline is surprising. To me > this seems to go contrary to the idea of a "pipeline". But does that actually happen? There's a very easy test we can do by sending queries that sleep. If my libpq program sends a "SELECT pg_sleep(2)", then PQflush(), then sleep in the client program two more seconds without sending the sync; and *then* send the sync, I find that the program takes 2 seconds, not four. This shows that both client and server slept in parallel, even though I didn't send the Sync until after the client was done sleeping. In order to see this, I patched libpq_pipeline.c with the attached, and ran it under time: time ./libpq_pipeline simple_pipeline -t simple.trace simple pipeline... sent and flushed the sleep. Sleeping 2s here: client sleep done ok real 0m2,008s user 0m0,000s sys 0m0,003s So I see things happening as you describe in (1): > In fact, I see the following ways the server could behave: > > 1. The server starts executing queries and sending their results before > receiving the sync message. I am completely at a loss on how to explain a server that behaves in any other way, given how the protocol is designed. There is no buffering on the server side. > While it can be tempting to say that this is an implementation detail, > this affects the way one writes a client. For example, I currently have > the following comment in my code: > > // Send queries until we get blocked. This feels like a better > // overall strategy to keep the server busy compared to sending one > // query at a time and then re-checking if there is anything to read > // because the results of INSERT/UPDATE/DELETE are presumably small > // and quite a few of them can get buffered before the server gets > // blocked. > > This would be a good strategy for behavior (1) but not (3) (where it > would make more sense to queue the queries on the client side). Agreed, that's the kind of strategy I would have thought was the most reasonable, given my understanding of how the protocol works. I wonder if your program is being affected by something else. Maybe the socket is nonblocking (though I don't quite understand how that would affect the client behavior in just this way), or your program is buffering elsewhere. I don't do C++ much so I can't help you with that. > So I think it would be useful to clarify the server behavior and > specify it in the documentation. I'll see about improving the docs on these points. > > Do I have it right that other than this documentation problem, you've > > been able to use pipeline mode successfully? > > So far I've only tried it in a simple prototype (single INSERT statement). > But I am busy plugging it into ODB's bulk operation support (that we > already have for Oracle and MSSQL) and once that's done I should be > able to exercise things in more meaningful ways. Fair enough. -- Álvaro Herrera 39°49'30"S 73°17'W
pgsql-hackers by date: