Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors - Mailing list pgsql-hackers
From | Marina Polyakova |
---|---|
Subject | Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors |
Date | |
Msg-id | e4c5e8cefa4a8e88f1273b0f1ee29e56@postgrespro.ru Whole thread Raw |
In response to | Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors (Fabien COELHO <coelho@cri.ensmp.fr>) |
List | pgsql-hackers |
> Hello, Hi! I'm very sorry that I did not answer for so long, I was very busy in the release of Postgres Pro 10 :( >> Here is the third version of the patch for pgbench thanks to Fabien >> Coelho comments. As in the previous one, transactions with >> serialization and deadlock failures are rolled back and retried until >> they end successfully or their number of tries reaches maximum. > > Here is some partial review. Thank you very much for it! > It compiles with warnings, please fix them: > > pgbench.c:2624:28: warning: ‘failure_status’ may be used > uninitialized in this function > pgbench.c:2697:34: warning: ‘command’ may be used uninitialized in > this function Ok! > I do not think that the error handling feature needs preeminence in the > final report, compare to scale, number of clients and so. The number > of tries should be put further on. I added it here only because both this field and field "transaction type" are transaction characteristics. I have some doubts where to add it. On the one hand, the number of clients, the number of transactions per client and the number of transactions actually processed form a good logical block which I don't want to divide. On the other hand, the number of clients and the number of transactions per client are parameters, but the number of transactions actually processed is one of the program results. Where, in your opinion, would it be better to add the maximum number of transaction tries? > I would spell "number of tries" instead of "tries number" which seems > to > suggest that each try is attributed a number. "sql" -> "SQL". Ok! > For the per statement latency final report, I do not think it is worth > distinguishing the kind of retry at this level, because ISTM that > serialization & deadlocks are unlikely to appear simultaneously. I > would just report total failures and total tries on this report. We > only have 2 errors now, but if more are added I'm pretty sure that we > would not want to have more columns... Thanks, I agree with you. > Moreover the 25 characters > alignment is ugly, better use a much smaller alignment. The variables for the numbers of failures and retries are of type int64 since the variable for the total number of transactions has the same type. That's why such a large alignment (as I understand it now, enough 20 characters). Do you prefer floating alignemnts, depending on the maximum number of failures/retries for any command in any script? > I'm okay with having details shown in the "log to file" group report. I think that the output format of retries statistics should be same everywhere, so I would just like to output the total number of retries here. > The documentation does not seem consistent. It discusses "the very last > fields" > and seem to suggest that there are two, but the example trace below > just > adds one field. I'm sorry, I do not understand what you are talking about. I used commands and the files from the end of your message ("psql < dl_init.sql" and "pgbench -f dl_trans.sql -c 8 -T 10 -P 1"), and I got this output from pgbench: starting vacuum...ERROR: relation "pgbench_branches" does not exist (ignoring this error and continuing anyway) ERROR: relation "pgbench_tellers" does not exist (ignoring this error and continuing anyway) ERROR: relation "pgbench_history" does not exist (ignoring this error and continuing anyway) end. progress: 1.0 s, 14.0 tps, lat 9.094 ms stddev 5.304 progress: 2.0 s, 25.0 tps, lat 284.934 ms stddev 450.692, 1 failed progress: 3.0 s, 21.0 tps, lat 337.942 ms stddev 473.210, 1 failed progress: 4.0 s, 11.0 tps, lat 459.041 ms stddev 499.908, 2 failed progress: 5.0 s, 28.0 tps, lat 220.219 ms stddev 411.390, 2 failed progress: 6.0 s, 5.0 tps, lat 402.695 ms stddev 492.526, 2 failed progress: 7.0 s, 24.0 tps, lat 343.249 ms stddev 626.181, 2 failed progress: 8.0 s, 14.0 tps, lat 505.396 ms stddev 501.836, 1 failed progress: 9.0 s, 40.0 tps, lat 180.080 ms stddev 381.335, 1 failed progress: 10.0 s, 1.0 tps, lat 0.000 ms stddev 0.000, 1 failed transaction type: dl_trans.sql transaction maximum tries number: 1 scaling factor: 1 query mode: simple number of clients: 8 number of threads: 1 duration: 10 s number of transactions actually processed: 191 number of failures: 14 (7.330 %) latency average = 356.701 ms latency stddev = 564.942 ms tps = 18.735807 (including connections establishing) tps = 18.744898 (excluding connections establishing) As I understand it, in the documentation "the very last fields" refer to the aggregation logging which is not used here. So what's the problem? > If you want a paragraph you should add <para>, skipping a line does not > work (around "All values are computed for ..."). Sorry, thanks =[ > I do not understand the second note of the --max-tries documentation. > It seems to suggest that some script may not end their own > transaction... > which should be an error in my opinion? Some explanations would be > welcome. As you told me here [1], "I disagree about exit in ParseScript if the transaction block is not completed <...> and would break an existing feature.". Maybe it's be better to say this: In pgbench you can use scripts in which the transaction blocks do not end. Be careful in this case because transactions that span over more than one script are not rolled back and will not be retried in case of an error. In such cases, the script in which the error occurred is reported as failed. ? > I'm not sure that "Retries" deserves a type of its own for two > counters. Ok! > The "retries" in RetriesState may be redundant with these. The "retries" in RetriesState have a different goal: they sum up not all the retries during the execution of the current script but the retries for the current transaction. > The failures are counted on simple counters while retries have a type, > this is not consistent. I suggest to just use simple counters > everywhere. Ok! > I'm ok with having the detail report tell about failures & retries only > when some occured. Ok! > typo: sucessufully -> successfully Thanks! =[ > If a native English speaker could provide an opinion on that, and more > generally review the whole documentation, it would be great. I agree with you)) > I think that the rand functions should really take a random_state > pointer > argument, not a Thread or Client. Thanks, I agree. > I'm at odds that FailureStatus does not have a clean NO_FAILURE state, > and that it is merged with misc failures. :) It is funny but for the code it really did not matter) > I'm not sure that initRetries, mergeRetries, getAllRetries really > deserve a function. Ok! > I do not thing that there should be two accum Functions. Just extend > the existing one, and adding zero to zero is not a problem. Ok! > I guess that in the end pgbench & psql variables will have to be merged > if pgbench expression engine is to be used by psql as well, but this is > not linked to this patch. Ok! > The tap tests seems over-complicated and heavy with two pgbench run in > parallel... I'm not sure we really want all that complexity for this > somehow small feature. Moreover pgbench can run several scripts, I'm > not > sure why two pgbench would need to be invoked. Could something much > simpler and lighter be proposed instead to test the feature? Firstly, two pgbench need to be invoked because we don't know which of them will get a deadlock failure. Secondly, I tried much simplier tests but all of them failed sometimes although everything was ok: - tests in which pgbench runs 5 clients and 10 transactions per client for a serialization/deadlock failure on any client (sometimes there are no failures when it is expected that they will be) - tests in which pgbench runs 30 clients and 400 transactions per client for a serialization/deadlock failure on any client (sometimes there are no failures when it is expected that they will be) - tests in which the psql session starts concurrently and you use sleep commands to wait pgbench for 10 seconds (sometimes it does not work) Only advisory locks help me not to get such errors in the tests :( > The added code does not conform to Pg C style. For instance, if brace > should be aligned to the if. Please conform the project style. I'm sorry, thanks =[ > The is_transaction_block_end seems simplistic. ISTM that it would not > work with compound commands. It should be clearly documented somewhere. Thanks, I'll fix it. > Also find attached two scripts I used for some testing: > > psql < dl_init.sql > pgbench -f dl_trans.sql -c 8 -T 10 -P 1 [1] https://www.postgresql.org/message-id/alpine.DEB.2.20.1707121142300.12795%40lancre -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: