Re: Suggestion to add --continue-client-on-abort option to pgbench - Mailing list pgsql-hackers
From | Yugo Nagata |
---|---|
Subject | Re: Suggestion to add --continue-client-on-abort option to pgbench |
Date | |
Msg-id | 20250705000356.9637424fa2343ee6c2d08957@sraoss.co.jp Whole thread Raw |
List | pgsql-hackers |
Hi, On Tue, 1 Jul 2025 17:43:18 +0900 Rintaro Ikeda <ikedarintarof@oss.nttdata.com> wrote: > I've updated the previous patch based on your feedback. Below is a summary of > the changes from v4 to v5: Thank you for updating the patch. > On 2025/06/14 0:24, Yugo Nagata wrote: > > case PGRES_NONFATAL_ERROR: > > case PGRES_FATAL_ERROR: > > st->estatus = getSQLErrorStatus(PQresultErrorField(res, > > PG_DIAG_SQLSTATE)); > > if (canRetryError(st->estatus)) > > { > > if (verbose_errors) > > commandError(st, PQerrorMessage(st->con)); > > goto error; > > } > > /* fall through */ > > > > default: > > /* anything else is unexpected */ > > pg_log_error("client %d script %d aborted in command %d query %d: %s", > > st->id, st->use_file, st->command, qrynum, > > PQerrorMessage(st->con)); > > goto error; > > } > > > > When an SQL error other than a serialization or deadlock error occurs, an error message is > > output via pg_log_error in this code path. However, I think this should be reported only > > when verbose_errors is set, similar to how serialization and deadlock errors are handled when > > --continue-on-error is enabled > > I think the error message logged via pg_log_error is useful when verbose_errors > is not specified, because it informs users that the client has exited. Without > it, users may not notice that something went wrong. However, if a large number of errors occur, this could result in a significant increase in stderr output during the benchmark. Users can still notice that something went wrong by checking the “number of other failures” reported after the run, and I assume that in most cases, when --continue-on-error is enabled, users aren’t particularly interested in seeing individual error messages as they happen. It’s true that seeing error messages during the benchmark might be useful in some cases, but the same could be said for serialization or deadlock errors, and that’s exactly what the --verbose-errors option is for. Here are some comments on the patch. (1) } - else if (canRetryError(st->estatus)) + else if (continue_on_error | canRetryError(st->estatus)) st->state = CSTATE_ERROR; else st->state = CSTATE_ABORTED; Due to this change, when --continue-on-error is enabled, st->state is set to CSTATE_ERROR regardless of the type of error returned by readCommandResponse. When the error is not ESTATUS_OTHER_SQL_ERROR, e.g. ESTATUS_META_COMMAND_ERROR due to a failure of \gset with query returning more the one row. Therefore, this should be like: else if ((st->estatus == ESTATUS_OTHER_SQL_ERROR && continue_on_error) || canRetryError(st->estatus)) (2) + " --continue-on-error continue processing transations after a trasaction fails\n" "trasaction" is a typo and including "transaction" twice looks a bit redundant. Instead using the word "transaction", how about: "--continue-on-error continue running after an SQL error" ? This version is shorter, avoids repetition, and describes well the actual behavior when SQL statements fail. As for the comments: (3) - * A failed transaction is defined as unsuccessfully retried transactions. + * A failed transaction is defined as unsuccessfully retried transactions + * unless continue-on-error option is specified. * It can be one of two types: * * failed (the number of failed transactions) = @@ -411,6 +412,12 @@ typedef struct StatsData * 'deadlock_failures' (they got a deadlock error and were not * successfully retried). * + * When continue-on-error option is specified, + * failed (the number of failed transactions) = + * 'serialization_failures' + 'deadlock_failures' + + * 'other_sql_failures' (they got a error when continue-on-error option + * was specified). + * To explain explicitly that there are two definitions of failed transactions depending on the situation, how about: """ A failed transaction is counted differently depending on whether the --continue-on-error option is specified. Without --continue-on-error: failed (the number of failed transactions) = 'serialization_failures' (they got a serialization error and were not successfully retried) + 'deadlock_failures' (they got a deadlock error and were not successfully retried). When --continue-on-error is specified: failed (number of failed transactions) = 'serialization_failures' + 'deadlock_failures' + 'other_sql_failures' (they got some other SQL error; the transaction was not retried and counted as failed due to --continue-on-error). """ (4) + int64 other_sql_failures; /* number of failed transactions for + * reasons other than + * serialization/deadlock failure , which + * is enabled if --continue-on-error is + * used */ Is "counted" is more proper than "enabled" here? Af for the documentations: (5) The next line reports the number of failed transactions due to - serialization or deadlock errors (see <xref linkend="failures-and-retries"/> - for more information). + serialization or deadlock errors by default (see + <xref linkend="failures-and-retries"/> for more information). Would it be more readable to simply say: "The next line reports the number of failed transactions (see ... for more information), since definition of "failed transaction" has become a bit messy? (6) connection with the database server was lost or the end of script was reached without completing the last transaction. In addition, if execution of an SQL or meta command fails for reasons other than serialization or deadlock errors, - the client is aborted. Otherwise, if an SQL command fails with serialization or - deadlock errors, the client is not aborted. In such cases, the current - transaction is rolled back, which also includes setting the client variables - as they were before the run of this transaction (it is assumed that one - transaction script contains only one transaction; see - <xref linkend="transactions-and-scripts"/> for more information). + the client is aborted by default. However, if the --continue-on-error option + is specified, the client does not abort and proceeds to the next transaction + regardless of the error. This case is reported as other failures in the output. + Otherwise, if an SQL command fails with serialization or deadlock errors, the + client is not aborted. In such cases, the current transaction is rolled back, + which also includes setting the client variables as they were before the run + of this transaction (it is assumed that one transaction script contains only + one transaction; see <xref linkend="transactions-and-scripts"/> for more information). To emphasize the default behavior, I wonder it would be better to move "by default" to the beginning of the statements; like "By default, if execution of an SQL or meta command fails for reasons other than serialization or deadlock errors, the client is aborted." How about quoting "other failures"? like: "These cases are reported as "other failures" in the output." Also, I feel the meaning of "Otherwise" has becomes somewhat unclear since the explanation of --continue-on-error was added between the sentences So, how about clarifying that "the clients are not aborted due to serializable/deadlock even without --continue-on-error". For example; "On contrast, if an SQL command fails with serialization or deadlock errors, the client is not aborted even without <option>--continue-on-error</option>. Instead, the current transaction is rolled back, which also includes setting the client variables as they were before the run of this transaction (it is assumed that one transaction script contains only one transaction; see <xref linkend="transactions-and-scripts"/> for more information)." (7) The main report contains the number of failed transactions. If the - <option>--max-tries</option> option is not equal to 1, the main report also + <option>--max-tries</option> option is not equal to 1 and + <option>--continue-on-error</option> is not specified, the main report also contains statistics related to retries: the total number of retried Is that true? The retreis statitics would be included even without --continue-on-error. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
pgsql-hackers by date: