Home > mailing lists

Re: Pipeline mode and PQpipelineSync() - Mailing list pgsql-hackers

From	Alvaro Herrera
Subject	Re: Pipeline mode and PQpipelineSync()
Date	June 22, 2021 22:14:52
Msg-id	202106222214.ptjfmstb23mu@alvherre.pgsql Whole thread Raw
In response to	Re: Pipeline mode and PQpipelineSync() (Boris Kolpackov <boris@codesynthesis.com>)
Responses	Re: Pipeline mode and PQpipelineSync() Re: Pipeline mode and PQpipelineSync()
List	pgsql-hackers

Tree view

On 2021-Jun-21, Boris Kolpackov wrote:

> Alvaro Herrera <alvaro.herrera@2ndquadrant.com> writes:
> 
> > I think I should rephrase this to say that PQpipelineSync() is needed
> > where the user needs the server to start executing commands; and
> > separately indicate that it is possible (but not promised) that the
> > server would start executing commands ahead of time because $reasons.
> 
> I think always requiring PQpipelineSync() is fine since it also serves
> as an error recovery boundary. But the fact that the server waits until
> the sync message to start executing the pipeline is surprising. To me
> this seems to go contrary to the idea of a "pipeline".

But does that actually happen?  There's a very easy test we can do by
sending queries that sleep.  If my libpq program sends a "SELECT
pg_sleep(2)", then PQflush(), then sleep in the client program two more
seconds without sending the sync; and *then* send the sync, I find that
the program takes 2 seconds, not four.  This shows that both client and
server slept in parallel, even though I didn't send the Sync until after
the client was done sleeping.

In order to see this, I patched libpq_pipeline.c with the attached, and
ran it under time:

time ./libpq_pipeline  simple_pipeline -t simple.trace
simple pipeline... sent and flushed the sleep. Sleeping 2s here:
client sleep done
ok

real    0m2,008s
user    0m0,000s
sys    0m0,003s

So I see things happening as you describe in (1):

> In fact, I see the following ways the server could behave:
> 
> 1. The server starts executing queries and sending their results before
>    receiving the sync message.

I am completely at a loss on how to explain a server that behaves in any
other way, given how the protocol is designed.  There is no buffering on
the server side.

> While it can be tempting to say that this is an implementation detail,
> this affects the way one writes a client. For example, I currently have
> the following comment in my code:
> 
>   // Send queries until we get blocked. This feels like a better
>   // overall strategy to keep the server busy compared to sending one
>   // query at a time and then re-checking if there is anything to read
>   // because the results of INSERT/UPDATE/DELETE are presumably small
>   // and quite a few of them can get buffered before the server gets
>   // blocked.
> 
> This would be a good strategy for behavior (1) but not (3) (where it
> would make more sense to queue the queries on the client side).

Agreed, that's the kind of strategy I would have thought was the most
reasonable, given my understanding of how the protocol works.

I wonder if your program is being affected by something else.  Maybe the
socket is nonblocking (though I don't quite understand how that would
affect the client behavior in just this way), or your program is
buffering elsewhere.  I don't do C++ much so I can't help you with that.

> So I think it would be useful to clarify the server behavior and
> specify it in the documentation.

I'll see about improving the docs on these points.

> > Do I have it right that other than this documentation problem, you've
> > been able to use pipeline mode successfully?
> 
> So far I've only tried it in a simple prototype (single INSERT statement).
> But I am busy plugging it into ODB's bulk operation support (that we
> already have for Oracle and MSSQL) and once that's done I should be
> able to exercise things in more meaningful ways.

Fair enough.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W

pgsql-hackers by date:

From: Mike
Date: 22 June 2021, 22:07:53
Subject: Fwd: Emit namespace in post-copy output

From: Peter Geoghegan
Date: 22 June 2021, 22:37:06
Subject: Re: disfavoring unparameterized nested loops

Re: Pipeline mode and PQpipelineSync() - Mailing list pgsql-hackers

Previous

Next