Re: Suggestion to add --continue-client-on-abort option to pgbench - Mailing list pgsql-hackers

From Yugo Nagata
Subject Re: Suggestion to add --continue-client-on-abort option to pgbench
Date
Msg-id 20250710181701.c5682bde1012e7169381cdad@sraoss.co.jp
Whole thread Raw
In response to Re: Suggestion to add --continue-client-on-abort option to pgbench  (Rintaro Ikeda <ikedarintarof@oss.nttdata.com>)
List pgsql-hackers
On Wed, 9 Jul 2025 23:58:32 +0900
Rintaro Ikeda <ikedarintarof@oss.nttdata.com> wrote:

> Hi,
> 
> Thank you for the kind comments.
> 
> I've updated the previous patch.

Thank you for updating the patch!

> > However, if a large number of errors occur, this could result in a significant increase
> > in stderr output during the benchmark.
> > 
> > Users can still notice that something went wrong by checking the “number of other failures”
> > reported after the run, and I assume that in most cases, when --continue-on-error is enabled,
> > users aren’t particularly interested in seeing individual error messages as they happen.
> > 
> > It’s true that seeing error messages during the benchmark might be useful in some cases, but
> > the same could be said for serialization or deadlock errors, and that’s exactly what the
> > --verbose-errors option is for.
> 
> 
> I understand your concern. The condition for calling pg_log_error() was modified
> to reduce stderr output.
> Additionally, an error message was added for cases where some clients aborted
> while executing SQL commands, similar to other code paths that transition to
> st->state = CSTATE_ABORTED, as shown in the example below:
> 
> ```
>                         pg_log_error("client %d aborted while establishing connection", st->id);
>                         st->state = CSTATE_ABORTED;
> ```

            default:
                /* anything else is unexpected */
-               pg_log_error("client %d script %d aborted in command %d query %d: %s",
-                            st->id, st->use_file, st->command, qrynum,
-                            PQerrorMessage(st->con));
+               if (verbose_errors)
+                   pg_log_error("client %d script %d aborted in command %d query %d: %s",
+                                st->id, st->use_file, st->command, qrynum,
+                                PQerrorMessage(st->con));
                goto error;
        }

Thanks to this fix, error messages caused by SQL errors are now output only when
--verbose-errors is enable. However, the comment describes the condition as "unexpected",
and the message states that the client was "aborted". This does not seems accurate, since
clients are not aborted due to SQL errors when --continue-on-errors is enabled. 

I think the error message should be emitted using commandError() when both
--coneinut-on-errors and --verbose-errors are specified, like this;

            case PGRES_NONFATAL_ERROR:
            case PGRES_FATAL_ERROR:
                st->estatus = getSQLErrorStatus(PQresultErrorField(res,
                                                                   PG_DIAG_SQLSTATE));
                if (continue_on_error || canRetryError(st->estatus))
                {
                    if (verbose_errors)
                        commandError(st, PQerrorMessage(st->con));
                    goto error;
                }
                /* fall through */

In addition, the error message in the "default" case should be shown regardless
of the --verbose-errors since it represents an unexpected situation and should
always reported.

Finally, I believe this fix should be included in patch 0001 rather than 0003,
as it would be a part of the implementation of --continiue-on-error.


As of 0003:

+               {
+                   pg_log_error("client %d aborted while executing SQL commands", st->id);
                    st->state = CSTATE_ABORTED;
+               }
                break;

I understand that the patch is not directly related to --continue-on-error, similar to 0002,
and that it aims to improve the error message to indicate that the client was aborted due to
some error during readCommandResponse().

However, this message doesn't seem entirely accurate, since the error is not always caused
by an SQL command failure itself. For example, it could also be due to a failure of the \gset
meta-command.

In addition, this fix causes error messages to be emitted twice. For example, if \gset fails,
the following similar messages are printed:

 pgbench: error: client 0 script 0 command 0 query 0: expected one row, got 0
 pgbench: error: client 0 aborted while executing SQL commands

Even worse, if an unexpected error occurs in readCommandResponse() (i.e., the default case),
the following messages are emitted, both indicating that the client was aborted;

 pgbench: error: client 0 script 0 aborted in command ... query ...
 pgbench: error: client 0 aborted while executing SQL commands

I feel this is a bit redundant.

Therefore, if we are to improve these messages to indicate explicitly that the client
was aborted, I would suggest modifying the error messages in readCommandResponse() rather
than adding a new one in advanceConnectionState().

I've attached patch 0003 incorporating my suggestion. What do you think?

Additionally, the patch 0001 includes the fix that was originally part of
your proposed 0003, as previously discussed.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: A recent message added to pg_upgade
Next
From: Nazir Bilal Yavuz
Date:
Subject: Re: Explicitly enable meson features in CI