Thread: Re: Add Pipelining support in psql
Anthonin Bonnefoy wrote: > > What is the reasoning here behind this restriction? \gx is a wrapper > > of \g with expanded mode on, but it is also possible to call \g with > > expanded=on, bypassing this restriction. > > The issue is that \gx enables expanded mode for the duration of the > query and immediately reset it in sendquery_cleanup. With pipelining, > the command is piped and displaying is done by either \endpipeline or > \getresults, so the flag change has no impact. Forbidding it was a way > to make it clearer that it won't have the expected effect But it's not just \gx The following invocations don't respect the desired output destination and formats (ignoring them), when a pipeline is active: select ... \bind \g filename select ... \bind \g |program select ... \bind \g (format=unaligned tuples_only=on) Just like for \gx the problem is that in a pipeline, sending the query is not followed by getting the results, and the output properties of a query are lost in between. Best regards, -- Daniel Vérité https://postgresql.verite.pro/
(My previous message did not reach the lists, so re-sending with some edits.) On Fri, Feb 28, 2025 at 05:31:00PM +0100, Daniel Verite wrote: > The following invocations don't respect the desired output destination > and formats (ignoring them), when a pipeline is active: > > select ... \bind \g filename > select ... \bind \g |program > select ... \bind \g (format=unaligned tuples_only=on) > > Just like for \gx the problem is that in a pipeline, sending the query > is not followed by getting the results, and the output properties > of a query are lost in between. Right. I completely forgot that these options could be applied with a simple \g. With the results being decoupled from the execution, one can argue that the options defined at the moment when the query is sent and executed should be the moment commanding how the result are shaped when retrieving a batch with \getresult during a pipeline. However, this means that we would need to save the set of options from printQueryOpt when running the query depending on the number of results we expect, and reapply them in order of the results expected. We have the APIs to do that, with savePsetInfo() and restorePsetInfo(). Anyway, can we really say that the set of printQueryOpt saved at execution is the correct set to use? It is possible to have the opposite argument and say that we should just apply the printQueryOpt at the moment where \getresult is run. A benefit of that it to keep the loop retrieving results simpler in ExecQueryAndProcessResults(), because we pass down to this call *one* printQueryOpt as "opt". There's of course the simplicity argument that I do like a lot here, but applying the printing options at the time of \getresult feels also more natural to me. FWIW, I agree that HEAD is unbalanced with its handling of \gx, so we could do one of the following two things: 1) Ignore any formatting options given to \g, but also allow \gx to run, documenting that during a pipeline the formatting options are ignored, and that the set of options defined when doing a \getresult is the only thing that matters. 2) Invent a new meta-command (suggested by Daniel Verite off-list, and not published to the lists because I don't know how to do a reply-all), like a \send, a \push, forbiding entirely \gx and \g when in a pipeline. If we were to integrate options into this new meta-command, a split with \g would make an integration easier to think about. One name suggestion I can come up is \sendpipeline. I'd consider option 2, based on Daniel's concerns. One line I'm going to draw here is that we should not go down to manipulations of printQueryOpt while retrieving batch of results depending on the style of output that was defined when sending a query in a pipeline. Anthonin, as the primary author, any thoughts? -- Michael
Attachment
On Tue, Mar 4, 2025 at 1:57 AM Michael Paquier <michael@paquier.xyz> wrote: > Anyway, can we really say that the set of printQueryOpt saved at > execution is the correct set to use? It is possible to have the > opposite argument and say that we should just apply the printQueryOpt > at the moment where \getresult is run. A benefit of that it to keep > the loop retrieving results simpler in ExecQueryAndProcessResults(), > because we pass down to this call *one* printQueryOpt as "opt". > > There's of course the simplicity argument that I do like a lot here, > but applying the printing options at the time of \getresult feels also > more natural to me. Saving the printQueryOpt when a command is pushed was an option I had in mind if that was straightforward to implement. However, even with savePsetInfo, you will need to save values like gfname and gset_prefix since it impacts the output (it may make sense to move those in printQueryOpt). This would also need to be saved for all commands, like \close or \parse since we don't distinguish if a piped command generates an output or not. So that definitely looks like it would add a lot of complexity for limited benefit. > FWIW, I agree that HEAD is unbalanced with its handling of \gx, so we > could do one of the following two things: > 1) Ignore any formatting options given to \g, but also allow \gx to > run, documenting that during a pipeline the formatting options are > ignored, and that the set of options defined when doing a \getresult > is the only thing that matters. > 2) Invent a new meta-command (suggested by Daniel Verite off-list, and > not published to the lists because I don't know how to do a > reply-all), like a \send, a \push, forbiding entirely \gx and \g when > in a pipeline. If we were to integrate options into this new > meta-command, a split with \g would make an integration easier to > think about. One name suggestion I can come up is \sendpipeline. > > I'd consider option 2, based on Daniel's concerns. One line I'm going > to draw here is that we should not go down to manipulations of > printQueryOpt while retrieving batch of results depending on the style > of output that was defined when sending a query in a pipeline. Another possible option would be to directly send the command without requiring an additional meta-command, like "SELECT 1 \bind". However, this would make it more painful to introduce new parameters, plus it makes the \bind and \bind_named inconsistent as it is normally required to send the result with a separate meta-command. I like the \sendpipeline option. It makes it clearer that formatting options are not applicable within a pipeline (at least, in the current implementation) and I think it would make more sense to have those formatting options in \getresults or \endpipeline. I took a stab at creating the \sendpipeline meta-command. I've also realised there's a small leak where fname is currently not freed on queries like 'select ... \bind \gx filename' when within a pipeline, which is fixed in patch 0001.
Attachment
Anthonin Bonnefoy wrote: > Another possible option would be to directly send the command without > requiring an additional meta-command, like "SELECT 1 \bind". However, > this would make it more painful to introduce new parameters, plus it > makes the \bind and \bind_named inconsistent as it is normally > required to send the result with a separate meta-command. AFAIU the reason why \bind is required (even when there are no $N parameters in the query) is to trigger the use of the extended query protocol. This pre-dates the pipeline feature. But when in a pipeline, we can't send queries with the simple query protocol anyway, so a possible usability improvement would be to make \bind optional in that case. Concretely it's not possible currently to issue: \startpipeline select 1; it causes the error: "PQsendQuery not allowed in pipeline mode" whereas this sequence does works: \startpipeline \bind select 1; \flushrequest \getresults But if the code triggered the use of the extended query protocol if \bind is in effect *or* a pipeline is active, then the first sequence would just push "select 1" into the pipeline. This would have the advantage that, to submit into a pipeline a pre-existing file with SQL commands separated with ";" you don't have to pre-process it to inject metacommands. Adding a \startpipeline at the beginning and an \endpipeline at the end would be sufficient in the cases that the user does not need the results before the end. The \sendpipeline is not mandatory when ";" can be used to terminate the queries. But it makes it clearer that the script wants specifically to push into a pipeline, and it might accept specific options in the future, whereas obviously ";" cannot. Best regards, -- Daniel Vérité https://postgresql.verite.pro/
On Tue, Mar 4, 2025 at 1:32 PM Daniel Verite <daniel@manitou-mail.org> wrote: > But if the code triggered the use of the extended query protocol > if \bind is in effect *or* a pipeline is active, then the first sequence > would just push "select 1" into the pipeline. > > This would have the advantage that, to submit into a pipeline > a pre-existing file with SQL commands separated with ";" you don't have > to pre-process it to inject metacommands. Adding a \startpipeline at > the beginning and an \endpipeline at the end would be sufficient in the > cases that the user does not need the results before the end. > > The \sendpipeline is not mandatory when ";" can be used to terminate > the queries. But it makes it clearer that the script wants > specifically to push into a pipeline, and it might accept specific > options in the future, whereas obviously ";" cannot. So if I understand correctly, you want to automatically convert a simple query into an extended query when we're within a pipeline. That would be doable with: --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -1668,7 +1668,16 @@ ExecQueryAndProcessResults(const char *query, } break; case PSQL_SEND_QUERY: - success = PQsendQuery(pset.db, query); + if (PQpipelineStatus(pset.db) != PQ_PIPELINE_OFF) { + success = PQsendQueryParams(pset.db, query, + pset.bind_nparams, NULL, + (const char *const *) pset.bind_params, + NULL, NULL, 0); + if (success) + pset.piped_commands++; + } + else + success = PQsendQuery(pset.db, query); break; } I do see the idea to make it easier to convert existing scripts into using pipelining. The main focus of the initial implementation was more on protocol regression tests with psql, so that's not necessarily something I had in mind. I have some reservation as it will push all parameters in the query string which may not be the desired behaviour. But on the other hand, if it is to convert existing psql scripts, then everything was already pushed as simple queries. Plus, this is similar to what pgbench is doing when using -Mextended or -Mprepared.
On Tue, Mar 04, 2025 at 10:31:46AM +0100, Anthonin Bonnefoy wrote: > Saving the printQueryOpt when a command is pushed was an option I had > in mind if that was straightforward to implement. However, even with > savePsetInfo, you will need to save values like gfname and gset_prefix > since it impacts the output (it may make sense to move those in > printQueryOpt). This would also need to be saved for all commands, > like \close or \parse since we don't distinguish if a piped command > generates an output or not. So that definitely looks like it would add > a lot of complexity for limited benefit. Yeah, same opinion here. I don't want this level of complexity with extra manipulations of printQueryOpt when fetching the results, either. I'm all for making these meta-commands to what we think is more natural, but not at the cost of a more complex logic in the result printing depending on what's been given by a meta-command when a query is pushed to a pipeline. > I took a stab at creating the \sendpipeline meta-command. I've also > realised there's a small leak where fname is currently not freed on > queries like 'select ... \bind \gx filename' when within a pipeline, > which is fixed in patch 0001. Indeed. Fixed this one for now. -- Michael
Attachment
On Tue, Mar 04, 2025 at 06:37:09PM +0100, Anthonin Bonnefoy wrote: > I do see the idea to make it easier to convert existing scripts into > using pipelining. The main focus of the initial implementation was > more on protocol regression tests with psql, so that's not necessarily > something I had in mind. I have some reservation as it will push all > parameters in the query string which may not be the desired behaviour. > But on the other hand, if it is to convert existing psql scripts, then > everything was already pushed as simple queries. Plus, this is similar > to what pgbench is doing when using -Mextended or -Mprepared. Hmm. Simplicity is tempting here because we know the status of the pipeline when sending the query. If we do something like what you are suggesting here, do we actually need the \sendpipeline at all? We should still prevent \g, \gx and others from running in a pipeline because of the format argument raised by Daniel so as we restrict the use of meta-commands that can manipulate the output format, right? -- Michael
Attachment
Anthonin Bonnefoy wrote: > So if I understand correctly, you want to automatically convert a > simple query into an extended query when we're within a pipeline. That > would be doable with: > > --- a/src/bin/psql/common.c > +++ b/src/bin/psql/common.c > @@ -1668,7 +1668,16 @@ ExecQueryAndProcessResults(const char *query, > } > break; > case PSQL_SEND_QUERY: > - success = PQsendQuery(pset.db, query); > + if (PQpipelineStatus(pset.db) != PQ_PIPELINE_OFF) { > + success = PQsendQueryParams(pset.db, query, > + > pset.bind_nparams, NULL, > + (const > char *const *) pset.bind_params, > + NULL, NULL, > 0); > + if (success) > + pset.piped_commands++; > + } > + else > + success = PQsendQuery(pset.db, query); > break; > } Yes, except that the bind parameters need to be cleared after this, as done in clean_extended_state() > I do see the idea to make it easier to convert existing scripts into > using pipelining. The main focus of the initial implementation was > more on protocol regression tests with psql, so that's not necessarily > something I had in mind. Understood. Yet pipelining can accelerate considerably certain scripts when client-server latency is an issue. We should expect end users to benefit from it too. > I have some reservation as it will push all > parameters in the query string which may not be the desired > behaviour. I don't follow. For me the change discussed here is about simplifying the syntax when there is no out-of-query $N-style parameters, it does not change anything for queries that actually use them, nor does it forbid a \bind without parameters. Best regards, -- Daniel Vérité https://postgresql.verite.pro/
On Wed, Mar 05, 2025 at 03:25:12PM +0100, Daniel Verite wrote: > Anthonin Bonnefoy wrote: >> I do see the idea to make it easier to convert existing scripts into >> using pipelining. The main focus of the initial implementation was >> more on protocol regression tests with psql, so that's not necessarily >> something I had in mind. > > Understood. Yet pipelining can accelerate considerably certain scripts > when client-server latency is an issue. We should expect end users to > benefit from it too. That was not a test case we had in mind originally here, but if it is possible to keep the implementation simple while supporting your demand, well, let's do it. If it's not that straight-forward, let's use the new meta-command, forbidding \g and \gx based on your arguments from upthread. -- Michael
Attachment
Hello, After commit 2cce0fe on master When executing query: psql postgres <<EOF CREATE TABLE psql_pipeline(); \startpipeline COPY psql_pipeline FROM STDIN; SELECT 'val1'; \syncpipeline \getresults EOF ERROR: unexpected message type 0x50 during COPY from stdin CONTEXT: COPY psql_pipeline, line 1 Pipeline aborted, command did not run psql: common.c:1510: discardAbortedPipelineResults: Assertion `res == ((void *)0) || result_status == PGRES_PIPELINE_ABORTED' failed. Aborted (core dumped) The psql crashes with the stack trace: (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x0000760edd24527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x0000760edd2288ff in __GI_abort () at ./stdlib/abort.c:79 #5 0x0000760edd22881b in __assert_fail_base (fmt=0x760edd3d01e8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5ba46ab79850 "res == ((void *)0) || result_status == PGRES_PIPELINE_ABORTED", file=file@entry=0x5ba46ab6fcad "common.c", line=line@entry=1510, function=function@entry=0x5ba46ab9c780 <__PRETTY_FUNCTION__.3> "discardAbortedPipelineResults") at ./assert/assert.c:96 #6 0x0000760edd23b517 in __assert_fail (assertion=assertion@entry=0x5ba46ab79850 "res == ((void *)0) || result_status == PGRES_PIPELINE_ABORTED", file=file@entry=0x5ba46ab6fcad "common.c", line=line@entry=1510, function=function@entry=0x5ba46ab9c780 <__PRETTY_FUNCTION__.3> "discardAbortedPipelineResults") at ./assert/assert.c:105 #7 0x00005ba46ab2bd40 in discardAbortedPipelineResults () at common.c:1510 #8 ExecQueryAndProcessResults (query=query@entry=0x5ba4a2ec1e10 "SELECT 'val1';", elapsed_msec=elapsed_msec@entry=0x7ffeb07262a8, svpt_gone_p=svpt_gone_p@entry=0x7ffeb07262a7, is_watch=is_watch@entry=false, min_rows=min_rows@entry=0, opt=opt@entry=0x0, printQueryFout=0x0) at common.c:1811 #9 0x00005ba46ab2983f in SendQuery (query=0x5ba4a2ec1e10 "SELECT 'val1';") at common.c:1212 #10 0x00005ba46ab3f66a in MainLoop (source=source@entry=0x760edd4038e0 <_IO_2_1_stdin_>) at mainloop.c:515 #11 0x00005ba46ab23f2a in process_file (filename=0x0, use_relative_path=use_relative_path@entry=false) at command.c:4870 #12 0x00005ba46ab1e9d9 in main (argc=<optimized out>, argv=0x7ffeb07269d8) at startup.c:420 06.03.2025 11:20, Michael Paquier пишет: > On Wed, Mar 05, 2025 at 03:25:12PM +0100, Daniel Verite wrote: >> Anthonin Bonnefoy wrote: >>> I do see the idea to make it easier to convert existing scripts into >>> using pipelining. The main focus of the initial implementation was >>> more on protocol regression tests with psql, so that's not necessarily >>> something I had in mind. >> Understood. Yet pipelining can accelerate considerably certain scripts >> when client-server latency is an issue. We should expect end users to >> benefit from it too. > That was not a test case we had in mind originally here, but if it is > possible to keep the implementation simple while supporting your > demand, well, let's do it. If it's not that straight-forward, let's > use the new meta-command, forbidding \g and \gx based on your > arguments from upthread. > -- > Michael
On Wed, Apr 16, 2025 at 09:31:59PM +0700, a.kozhemyakin wrote: > After commit 2cce0fe on master > > ERROR: unexpected message type 0x50 during COPY from stdin > CONTEXT: COPY psql_pipeline, line 1 > Pipeline aborted, command did not run > psql: common.c:1510: discardAbortedPipelineResults: Assertion `res == ((void > *)0) || result_status == PGRES_PIPELINE_ABORTED' failed. > Aborted (core dumped) Reproduced here, thanks for the report. I'll look at that at the beginning of next week, adding an open item for now. -- Michael
Attachment
On Wed, Apr 16, 2025 at 09:18:01AM -0700, Michael Paquier wrote: > On Wed, Apr 16, 2025 at 09:31:59PM +0700, a.kozhemyakin wrote: >> After commit 2cce0fe on master >> >> ERROR: unexpected message type 0x50 during COPY from stdin >> CONTEXT: COPY psql_pipeline, line 1 >> Pipeline aborted, command did not run >> psql: common.c:1510: discardAbortedPipelineResults: Assertion `res == ((void >> *)0) || result_status == PGRES_PIPELINE_ABORTED' failed. >> Aborted (core dumped) > > Reproduced here, thanks for the report. I'll look at that at the > beginning of next week, adding an open item for now. The failure is not related to 2cce0fe. The following sequence fails as well, as long as we have one SELECT after the COPY to mess up with the \getresults that causes a PGRES_FATAL_ERROR combined with a "terminating connection because protocol synchronization was lost" on the backend side, because the server expects some data while the client does not send it but psql is not able to cope with this state: \startpipeline COPY psql_pipeline FROM STDIN \bind \sendpipeline SELECT $1 \bind 'val1' \sendpipeline \syncpipeline \getresults \endpipeline It's actually nice that we are able to emulate such query patterns with psql using all these meta-commands, I don't think we have any coverage for the backend synchronization loss case yet like this one? 2cce0fe makes that easier to reach by allowing more command patterns, but it's the mix of COPY followed by a SELECT that causes psql to be confused. All the tests that we have don't check this kind of scenarios, for COPY TO/FROM, with always use a flush or a sync followed quickly by \getresults, but we don't have tests where we mix things. Anyway, I don't think that there is much we can do under a PGRES_FATAL_ERROR in this code path when discarding the pipe results. As far as I can tell, the server has failed the query suddenly and the whole pipeline flow is borked. The best thing that I can think of is to discard all the results while decrementing the counters, then let psql complain about that like in the attached. I've added two tests in TAP, as these trigger a FATAL in the backend so we cannot use the normal SQL route, so as we have some coverage. @Anthonin: Any thoughts or comments, perhaps? A second opinion would be welcome here. -- Michael
Attachment
On Mon, Apr 21, 2025 at 03:22:28PM +0900, Michael Paquier wrote: > Anyway, I don't think that there is much we can do under a > PGRES_FATAL_ERROR in this code path when discarding the pipe results. > As far as I can tell, the server has failed the query suddenly and the > whole pipeline flow is borked. The best thing that I can think of is > to discard all the results while decrementing the counters, then let > psql complain about that like in the attached. I've added two tests > in TAP, as these trigger a FATAL in the backend so we cannot use the > normal SQL route, so as we have some coverage. > > @Anthonin: Any thoughts or comments, perhaps? A second opinion would > be welcome here. While considering more ways to test this patch, I've recalled that injection points that issue a FATAL in the backend to emulate the original failure with more query patterns can provide more coverage, and the discard cleanup is showing stable enough as presented in the patch. I am wondering if we could not be smarter with the handling of the counters, but I really doubt that there is much more we can do under a PGRES_FATAL_ERROR. -- Michael
Attachment
On Tue, Apr 22, 2025 at 2:06 AM Michael Paquier <michael@paquier.xyz> wrote: > I am wondering if we could not be smarter with the handling of > the counters, but I really doubt that there is much more we can do > under a PGRES_FATAL_ERROR. One thing that bothers me is that the reported error is silently discarded within discardAbortedPipelineResults. psql -f bug_assertion.sql psql:bug_assertion.sql:7: ERROR: unexpected message type 0x50 during COPY from stdin CONTEXT: COPY psql_pipeline, line 1 psql:bug_assertion.sql:7: Pipeline aborted, command did not run This should ideally report the "FATAL: terminating connection because protocol synchronization was lost" sent by the backend process. Also, we still have a triggered assertion failure with the following: CREATE TABLE psql_pipeline(a text); \startpipeline COPY psql_pipeline FROM STDIN; SELECT 'val1'; \syncpipeline \endpipeline ... Assertion failed: (pset.piped_syncs == 0), function ExecQueryAndProcessResults, file common.c, line 2153. A possible alternative could be to abort discardAbortedPipelineResults when we encounter a res != NULL + FATAL error and let the outer loop handle it. As you said, the pipeline flow is borked so there's not much to salvage. The outer loop would read and print all error messages until the closed connection is detected. Then, CheckConnection will reset the connection which will reset the pipeline state. While testing this change, I was initially looking for the "FATAL: terminating connection because protocol synchronization was lost" message in the tests. However, this was failing on Windows[1] as the FATAL message wasn't reported on stderr. I'm not sure why yet. [1]: https://cirrus-ci.com/task/5051031505076224?logs=check_world#L240-L246
Attachment
On Tue, Apr 22, 2025 at 02:37:19PM +0200, Anthonin Bonnefoy wrote: > Also, we still have a triggered assertion failure with the following: > CREATE TABLE psql_pipeline(a text); > \startpipeline > COPY psql_pipeline FROM STDIN; > SELECT 'val1'; > \syncpipeline > \endpipeline > ... > Assertion failed: (pset.piped_syncs == 0), function > ExecQueryAndProcessResults, file common.c, line 2153. Right. I didn't think about the case of a \endpipeline that fetches all the results by itself. > A possible alternative could be to abort discardAbortedPipelineResults > when we encounter a res != NULL + FATAL error and let the outer loop > handle it. As you said, the pipeline flow is borked so there's not > much to salvage. The outer loop would read and print all error > messages until the closed connection is detected. Then, > CheckConnection will reset the connection which will reset the > pipeline state. Sounds like a better idea seen from here, yes. > While testing this change, I was initially looking for the "FATAL: > terminating connection because protocol synchronization was lost" > message in the tests. However, this was failing on Windows[1] as the > FATAL message wasn't reported on stderr. I'm not sure why yet. Hmm. I vaguely recall that there could be some race condition here with the attempt to catch up the FATAL message once the server tries to shut down the connection.. Anyway, I agree that it would be nice to track that this specific error message is generated in the server. How about checking the server logs instead, using a slurp_file() with an offset of the log file before running each pipeline sequence? We should use a few wait_for_log() calls, I think, to be extra careful with the timings where psql_fails_like() gives up, and I'm worried that this could be unstable on slow machines. Something like the attached seems stable enough here. What do you think? The tweak for psql_fails_like() was kind of independent of the rest of the fix, so I have applied that as a commit of its own. I am not convinced about the addition of a 4th test where we use the queries with semicolons without a \getresults between the sync and the end. -- Michael
Attachment
On Wed, Apr 23, 2025 at 04:13:14PM +0900, Michael Paquier wrote: > The tweak for psql_fails_like() was kind of independent of the rest of > the fix, so I have applied that as a commit of its own. I am not > convinced about the addition of a 4th test where we use the queries > with semicolons without a \getresults between the sync and the end. I've had some room to look again at all that this morning as this is an open item, and I have applied the fix. One tweak was in the tests, where I have added only one wait_for_log() which should offer enough coverage. If you notice anything else, please let me know. -- Michael