Home > mailing lists
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors - Mailing list pgsql-hackers

From	Marina Polyakova
Subject	Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Date	July 11, 2018 12:06:05
Msg-id	a6299efa310e1d106144f761259f4fd6@postgrespro.ru Whole thread Raw
In response to	Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses	Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
List	pgsql-hackers
Tree view
On 09-07-2018 16:05, Fabien COELHO wrote:
> Hello Marina,

Hello, Fabien!

> Here is a review for the last part of your v9 version.

Thank you very much for this!

> Patch does not "git apply" (may anymore):
>   error: patch failed: doc/src/sgml/ref/pgbench.sgml:513
>   error: doc/src/sgml/ref/pgbench.sgml: patch does not apply

Sorry, I'll send a new version soon.

> However I could get it to apply with the "patch" command.
> 
> Then patch compiles, global & pgbench "make check" are ok.

:-)

> Feature
> =======
> 
> The patch adds the ability to restart transactions (i.e. the full 
> script)
> on some errors, which is a good thing as it allows to exercice postgres
> performance in more realistic scenarii.
> 
> * -d/--debug: I'm not in favor in requiring a mandatory text argument 
> on this
> option. It is not pratical, the user has to remember it, and it is a 
> change.
> I'm sceptical of the overall debug handling changes. Maybe we could 
> have
> multiple -d which lead to higher debug level, but I'm not sure that it 
> can be
> made to work for this case and still be compatible with the previous 
> behavior.
> Maybe you need a specific option for your purpose, eg "--debug-retry"?

As you wrote in [1], adding an additional option is also a bad idea:

> I'm sceptical of the "--debug-fails" options. ISTM that --debug is
> already there
> and should just be reused.

Maybe it's better to use an optional argument/arguments for 
compatibility (--debug[=fails] or --debug[=NUM])? But if we use the 
numbers, now I can see only 2 levels, and there's no guarantee that they 
will no change..

> Code
> ====
> 
> * The implementation is less complex that the previous submission,
> which is a good thing. I'm not sure that all the remaining complexity
> is still fully needed.
> 
> * I'm reserved about the whole ereport thing, see comments in other
> messages.

Thank you, I'll try to implement the error reporting in the way you 
suggested.

> Leves ELEVEL_LOG_CLIENT_{FAIL,ABORTED} & LOG_MAIN look unclear to me.
> In particular, the "CLIENT" part is not very useful. If the
> distinction makes sense, I would have kept "LOG" for the initial one 
> and
> add other ones for ABORT and PGBENCH, maybe.

Ok!

> * There are no comments about "retries" in StatData, CState and Command
> structures.
> 
> * Also, for StatData, I would like to understand the logic between cnt,
> skipped, retries, retried, errors, ... so a clear information about the
> expected invariant if any would be welcome. One has to go in the code 
> to
> understand how these fields relate one to the other.
> 
> <...>
> 
> * How "errors" differs from "ecnt" is unclear to me.

Thank you, I'll fix this.

> * commandFailed: I think that it should be kept much simpler. In
> particular, having errors on errors does not help much: on
> ELEVEL_FATAL, it ignores the actual reported error and generates
> another error of the same level, so that the initial issue is hidden.
> Even if these are can't happen cases, hidding the origin if it occurs
> looks unhelpful. Just print it directly, and maybe abort if you think
> that it is a can't happen case.

Oh, thanks, my mistake(

> * copyRandomState: just use sizeof(RandomState) instead of making 
> assumptions
> about the contents of the struct. Also, this function looks pretty 
> useless,
> why not just do a plain assignment?
> 
> * copyVariables: lacks comments to explain that the destination is 
> cleaned up
> and so on. The cleanup phase could probaly be in a distinct function, 
> so that
> the code would be clearer. Maybe the function variable names are too 
> long.

Thank you, I'll fix this.

>   if (current_source->svalue)
> 
> in the context of a guard for a strdup, maybe:
> 
>   if (current_source->svalue != NULL)

I'm sorry, I'll fix this.

> * I do not understand the comments on CState enum: "First, remember the 
> failure
> in CSTATE_FAILURE. Then process other commands of the failed 
> transaction if any"
> Why would other commands be processed at all if the transaction is 
> aborted?
> For me any error must leads to the rollback and possible retry of the
> transaction. This comment needs to be clarified. It should also say
> that on FAILURE, it will go either to RETRY or ABORTED. See below my
> comments about doCustom.
> 
> It is unclear to me why their could be several failures within a
> transaction, as I would have stopped that it would be aborted on the
> first one.
> 
> * I do not undestand the purpose of first_failure. The comment should 
> explain
> why it would need to be remembered. From my point of view, I'm not 
> fully
> convinced that it should.
> 
> <...>
> 
> * executeCondition: this hides client automaton state changes which 
> were
> clearly visible beforehand in the switch, and the different handling of
> if & elif is also hidden.
> 
> I'm against this unnecessary restructuring and to hide such an 
> information,
> all state changes should be clearly seen in the state switch so that it 
> is
> easier to understand and follow.
> 
> I do not see why touching the conditional stack on internal errors
> (evaluateExpr failure) brings anything, the whole transaction will be 
> aborted
> anyway.
> 
> * doCustom changes.
> 
> On CSTATE_START_COMMAND, it considers whether to retry on the end.
> For me, this cannot happen: if some command failed, then it should have
> skipped directly to the RETRY state, so that you cannot get to the end
> of the script with an error. Maybe you could assert that the state of 
> the
> previous command is NO_FAILURE, though.
> 
> On CSTATE_FAILURE, the next command is possibly started. Although there 
> is some
> consistency with the previous point, I think that it totally breaks the 
> state
> automaton where now a command can start while the whole transaction is
> in failing state anyway. There was no point in starting it in the first 
> place.
> 
> So, for me, the FAILURE state should record/count the failure, then 
> skip
> to RETRY if a retry is decided, else proceed to ABORT. Nothing else.
> This is much clearer that way.
> 
> Then RETRY should reinstate the global state and proceed to start the 
> *first*
> command again.
> 
> <...>
> 
> It is unclear to me why backslash command errors are turned to FAILURE
> instead of ABORTED: there is no way they are going to be retried, so
> maybe they should/could skip directly to ABORTED?
> 
> Function executeCondition is a bad idea, as stated above.

So do you propose to execute the command "ROLLBACK" without calculating 
its latency etc. if we are in a failed transaction and clear the 
conditional stack after each failure?

Also just to be clear: do you want to have the state CSTATE_ABORTED for 
client abortion and another state for interrupting the current 
transaction?

> The current RETRY state does memory allocations to generate a message
> with buffer allocation and so on. This looks like a costly and useless
> operation. If the user required "retries", then this is normal 
> behavior,
> the retries are counted and will be printed out in the final report,
> and there is no point in printing out every single one of them.
> Maybe you want that debugging, but then coslty operations should be 
> guarded.

I think we need these debugging messages because, for example, if you 
use the option --latency-limit, you we will never know in advance 
whether the serialization/deadlock failure will be retried or not. They 
also help to understand which limit of retries was violated or how close 
we were to these limits during the execution of a specific transaction. 
But I agree with you that they are costly and can be skipped if the 
failure type is never retried. Maybe it is better to split them into 
multiple error function calls?..

> * reporting
> 
> The number of transactions above the latency limit report can be 
> simplified.
> Remove the if and just use one printf f with a %s for the optional 
> comment.
> I'm not sure this optional comment is useful there.

Oh, thanks, my mistake(

> Before the patch, ISTM that all lines relied on one printf. you have
> changed to a style where a collection of printf is used to compose a
> line. I'd suggest to keep to the previous one-printf-prints-one-line
> style, where possible.

Ok!

> You have added 20-columns alignment prints. This looks like too much 
> and
> generates much too large lines. Probably 10 (billion) would be enough.

I have already asked you about this in [2]:
> The variables for the numbers of failures and retries are of type int64
> since the variable for the total number of transactions has the same
> type. That's why such a large alignment (as I understand it now, enough
> 20 characters). Do you prefer floating alignemnts, depending on the
> maximum number of failures/retries for any command in any script?

> Some people try to parse the output, so it should be deterministic. I'd 
> add
> the needed columns always if appropriate (i.e. under retry), even if 
> none
> occured.

Ok!

> * processXactStats: An else is replaced by a detailed stats, with the 
> initial
> "no detailed stats" comment kept. The function is called both in the 
> thenb
> & else branch. The structure does not make sense anymore. I'm not sure
> this changed was needed.
> 
> * getLatencyUsed: declared "double" so "return 0.0".
> 
> * typo: ruin -> run; probably others, I did not check for them in 
> detail.

Oh, thanks, my mistakes(

> TAP Tests
> =========
> 
> On my laptop, tests last 5.5 seconds before the patch, and about 13 
> seconds
> after. This is much too large. Pgbench TAP tests do not deserve to take 
> over
> twice as much time as before just on this patch.
> 
> One reason which explains this large time is there is a new script
> with a new created instance. I'd suggest to append tests to the
> existing 2 scripts, depending on whether they need a running instance
> or not.

Ok! All new tests that do not need a running instance are already added 
to the file 002_pgbench_no_server.pl.

> Secondly, I think that the design of the tests are too heavy. For such
> a feature, ISTM enough to check that it works, i.e. one test for
> deadlocks (trigger one or a few deadlocks), idem for serializable,
> maybe idem for other errors if any.
> 
> <...>
> 
> The latency limit to 900 ms try is a bad idea because it takes a lot of 
> time.
> I did such tests before and they were removed by Tom Lane because of 
> determinism
> and time issues. I would comment this test out for now.

Ok! If it doesn't bother you - can you tell more about the causes of 
these determinism issues?.. Tests for some other failures that cannot be 
retried are already added to 001_pgbench_with_server.pl.

> The challenge is to do that reliably and efficiently, i.e. so that the 
> test does
> not rely on chance and is still quite efficient.
> 
> The trick you use is to run an interactive psql in parallel to pgbench 
> so as to
> play with concurrent locks. That is interesting, but deserves more 
> comments
> and explanatation, eg before the test functions.
> 
> Maybe this could be achieved within pgbench by using some wait stuff
> in PL/pgSQL so that concurrent client can wait one another based on
> data in unlogged table updated by a CALL within an "embedded"
> transactions? Not sure.
> 
> <...>
> 
> Anyway, TAP tests should be much lighter (in total time), and if
> possible much simpler.

I'll try, thank you..

> Otherwise, maybe (simple) pgbench-side thread
> barrier could help, but this would require more thinking.

Tests must pass if we use --disable-thread-safety..

> Documentation
> =============
> 
> Not looked at in much details for now. Just a few comments:
> 
> Having the "most important settings" on line 1-6 and 8 (i.e. skipping 
> 7) looks
> silly. The important ones should simply be the first ones, and the 8th 
> is not
> that important, or it is in 7th position.

Ok!

> I do not understand why there is so much text about in failed sql 
> transaction
> stuff, while we are mainly interested in serialization & deadlock 
> errors, and
> this only falls in some "other" category. There seems to be more 
> details about
> other errors that about deadlocks & serializable errors.
> 
> The reporting should focus on what is of interest, either all errors, 
> or some
> detailed split of these errors.
> 
> <...>
> 
> * "errors_in_failed_tx" is some subcounter of "errors", for a special
> case. Why it is there fails me [I finally understood, and I think it
> should be removed, see end of review]. If we wanted to distinguish,
> then we should distinguish homogeneously: maybe just count the
> different error types, eg have things like "deadlock_errors",
> "serializable_errors", "other_errors", "internal_pgbench_errors" which
> would be orthogonal one to the other, and "errors" could be recomputed
> from these.

Thank you, I agree with you. Unfortunately each new error type adds a 
new 1 or 2 columns of maximum width 20 to the per-statement report (to 
report errors and possibly retries of this type in this statement) and 
we already have 2 new columns for all errors and retries. So I'm not 
sure that we need add anything other than statistics only about all the 
errors and all the retries in general.

> The documentation should state clearly what
> are the counted errors, and then what are their effects on the reported 
> stats.
> The "Errors and Serialization/Deadlock Retries" section is a good start 
> in that
> direction, but it does not talk about pgbench internal errors (eg 
> "cos(true)").
> I think it should more explicit about errors.

Thank you, I'll try to improve it.

> Option --max-tries default value should be spelled out in the doc.

If you mean that it is set to 1 if neither of the options --max-tries or 
--latency-limit is explicitly used, I'll fix this.

> "Client's run is aborted", do you mean "Pgbench run is aborted"?

No, other clients continue their run as usual.

> * FailureStatus states are not homogeneously named. I'd suggest to use
> *_FAILURE for all cases. The miscellaneous case should probably be the
> last. I do not understand the distinction between ANOTHER_FAILURE &
> IN_FAILED_SQL_TRANSACTION. Why should it be needed? [again, see and of
> review]
> 
> <...>
> 
> "If a failed transaction block does not terminate in the current 
> script":
> this just looks like a very bad idea, and explains my general ranting
> above about this error condition. ISTM that the only reasonable option
> is that a pgbench script should be inforced as a transaction, or a set 
> of
> transactions, but cannot be a "piece" of transaction, i.e. pgbench 
> script
> with "BEGIN;" but without a corresponding "COMMIT" is a user error and
> warrants an abort, so that there is no need to manage these "in aborted
> transaction" errors every where and report about them and document them
> extensively.
> 
> This means adding a check when a script is finished or starting that
> PQtransactionStatus(const PGconn *conn) == PQTRANS_IDLE, and abort if 
> not
> with a fatal error. Then we can forget about these "in tx errors" 
> counting,
> reporting and so on, and just have to document the restriction.

Ok!

[1] 
https://www.postgresql.org/message-id/alpine.DEB.2.20.1801031720270.20034%40lancre
[2] 
https://www.postgresql.org/message-id/e4c5e8cefa4a8e88f1273b0f1ee29e56@postgrespro.ru

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
pgsql-hackers by date:
From: Yugo Nagata
Date: 11 July 2018, 12:04:08
Subject: Allow to specify a index name as ANALYZE parameter
From: Heikki Linnakangas
Date: 11 July 2018, 12:27:33
Subject: Negotiating the SCRAM channel binding type
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors - Mailing list pgsql-hackers

Previous

Next