Re: Suggestion to add --continue-client-on-abort option to pgbench - Mailing list pgsql-hackers

From Yugo Nagata
Subject Re: Suggestion to add --continue-client-on-abort option to pgbench
Date
Msg-id 20250920002119.c3c75a4cae1daf69789db45f@sraoss.co.jp
Whole thread Raw
In response to Re: Suggestion to add --continue-client-on-abort option to pgbench  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On Fri, 19 Sep 2025 19:21:29 +0900
Fujii Masao <masao.fujii@gmail.com> wrote:

> On Fri, Sep 19, 2025 at 11:43 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > On Thu, Sep 18, 2025 at 4:20 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > >
> > > On Thu, 18 Sep 2025 14:37:29 +0900
> > > Fujii Masao <masao.fujii@gmail.com> wrote:
> > >
> > > > On Thu, Sep 18, 2025 at 10:22 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > > > > That makes sense. How about rewriting this like:
> > > > >
> > > > >  However, if the --continue-on-error option is specified and the error occurs in
> > > > >  an SQL command, the client does not abort and proceeds to the next
> > > > >  transaction regardless of the error. These cases are reported as "other failures"
> > > > >  in the output. Note that if the error occurs in a meta-command, the client will
> > > > >  still abort even when this option is specified.
> > > >
> > > > How about phrasing it like this, based on your version?
> > > >
> > > > ----------------------------
> > > > A client's run is aborted in case of a serious error; for example, the
> > > > connection with the database server was lost or the end of script was reached
> > > > without completing the last transaction.  The client also aborts
> > > > if a meta-command fails, or if an SQL command fails for reasons other than
> > > > serialization or deadlock errors when --continue-on-error is not specified.
> > > > With --continue-on-error, the client does not abort on such SQL errors
> > > > and instead proceeds to the next transaction.  These cases are reported
> > > > as "other failures" in the output.  If the error occurs in a meta-command,
> > > > however, the client still aborts even when this option is specified.
> > > > ----------------------------
> > >
> > > I'm fine with that. This version is clearer.
> >
> > Thanks for checking!
> 
> I've updated the 0001 patch based on the comments.
> The revised version is attached.

Thank you for updating the patch.

> 
> While testing, I found that running pgbench with --continue-on-error and
> pipeline mode triggers the following assertion failure. Could this be
> a bug in the patch?
> 
> ---------------------------------------------------
> $ cat pipeline.pgbench
> \startpipeline
> DO $$
>   BEGIN
>     PERFORM pg_sleep(3);
>     PERFORM pg_terminate_backend(pg_backend_pid());
>   END $$;
> \endpipeline
> 
> $ pgbench -n --debug --verbose-errors -f pipeline.pgbench -c 2 -t 4 -M
> extended --continue-on-error
> ...
> Assertion failed:
> (sql_script[st->use_file].commands[st->command]->type == 1), function
> commandError, file pgbench.c, line 3081.
> Abort trap: 6
> ---------------------------------------------------
> 
> When I ran the same command without --continue-on-error,
> the assertion failure did not occur.

I think this bug was introduced by commit 4a39f87acd6e, which enabled pgbench
to retry and added the --verbose-errors option, rather than by this patch itself.

The assertion failure occurs in commandError(), which is called to report an error when
it can be retried (i.e., serializable failure or deadlock), or when --continue-on-error
is used after this patch.

 Assert(sql_script[st->use_file].commands[st->command]->type == SQL_COMMAND);

This assumes the error is always detected during SQL command execution, but
that’s not correct, since in pipeline mode, the error can be detected when
a \endpipeline meta-command is executed.

 $ cat deadlock.sql 
 \startpipeline
 begin;
 lock b;
 lock a;
 end;
 \endpipeline

 $ cat deadlock2.sql 
 \startpipeline
 begin;
 lock a;
 lock b;
 end;
 \endpipeline

 $ pgbench --verbose-errors -f deadlock.sql  -f deadlock2.sql -c 2 -T 3 -M extended 
 pgbench (19devel)
 starting vacuum...end.
 pgbench: pgbench.c:3062: commandError: Assertion `sql_script[st->use_file].commands[st->command]->type == 1' failed.

Although one option would be to remove this assertion, if we prefer to keep it,
the attached patch fixes the issue.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Attachment

pgsql-hackers by date:

Previous
From: Arseniy Mukhin
Date:
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Next
From: Tom Lane
Date:
Subject: Re: Use opresulttype instead of calling SearchSysCache1() in match_orclause_to_indexcol()