Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors - Mailing list pgsql-hackers
From | Marina Polyakova |
---|---|
Subject | Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors |
Date | |
Msg-id | c262e889315625e0fc0d77ca78fe2eac@postgrespro.ru Whole thread Raw |
In response to | Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
|
List | pgsql-hackers |
On 11-09-2018 18:29, Fabien COELHO wrote: > Hello Marina, > >> Hmm, but we can say the same for serialization or deadlock errors that >> were not retried (the client test code itself could not run correctly >> or the SQL sent was somehow wrong, which is also the client's fault), >> can't we? > > I think not. > > If a client asks for something "legal", but some other client in > parallel happens to make an incompatible change which result in a > serialization or deadlock error, the clients are not responsible for > the raised errors, it is just that they happen to ask for something > incompatible at the same time. So there is no user error per se, but > the server is reporting its (temporary) inability to process what was > asked for. For these errors, retrying is fine. If the client was > alone, there would be no such errors, you cannot deadlock with > yourself. This is really an isolation issue linked to parallel > execution. You can get other errors that cannot happen for only one client if you use shell commands in meta commands: starting vacuum...end. transaction type: pgbench_meta_concurrent_error.sql scaling factor: 1 query mode: simple number of clients: 2 number of threads: 1 number of transactions per client: 10 number of transactions actually processed: 20/20 maximum number of tries: 1 latency average = 6.953 ms tps = 287.630161 (including connections establishing) tps = 303.232242 (excluding connections establishing) statement latencies in milliseconds and failures: 1.636 0 BEGIN; 1.497 0 \setshell var mkdir my_directory && echo 1 0.007 0 \sleep 1 us 1.465 0 \setshell var rmdir my_directory && echo 1 1.622 0 END; starting vacuum...end. mkdir: cannot create directory ‘my_directory’: File exists mkdir: could not read result of shell command client 1 got an error in command 1 (setshell) of script 0; execution of meta-command failed transaction type: pgbench_meta_concurrent_error.sql scaling factor: 1 query mode: simple number of clients: 2 number of threads: 1 number of transactions per client: 10 number of transactions actually processed: 19/20 number of failures: 1 (5.000%) number of meta-command failures: 1 (5.000%) maximum number of tries: 1 latency average = 11.782 ms (including failures) tps = 161.269033 (including connections establishing) tps = 167.733278 (excluding connections establishing) statement latencies in milliseconds and failures: 2.731 0 BEGIN; 2.909 1 \setshell var mkdir my_directory && echo 1 0.231 0 \sleep 1 us 2.366 0 \setshell var rmdir my_directory && echo 1 2.664 0 END; Or if you use untrusted procedural languages in SQL expressions (see the used file in the attachments): starting vacuum...ERROR: relation "pgbench_branches" does not exist (ignoring this error and continuing anyway) ERROR: relation "pgbench_tellers" does not exist (ignoring this error and continuing anyway) ERROR: relation "pgbench_history" does not exist (ignoring this error and continuing anyway) end. client 1 got an error in command 0 (SQL) of script 0; ERROR: could not create the directory "my_directory": File exists at line 3. CONTEXT: PL/Perl anonymous code block client 1 got an error in command 0 (SQL) of script 0; ERROR: could not create the directory "my_directory": File exists at line 3. CONTEXT: PL/Perl anonymous code block transaction type: pgbench_concurrent_error.sql scaling factor: 1 query mode: simple number of clients: 2 number of threads: 1 number of transactions per client: 10 number of transactions actually processed: 18/20 number of failures: 2 (10.000%) number of serialization failures: 0 (0.000%) number of deadlock failures: 0 (0.000%) number of other SQL failures: 2 (10.000%) maximum number of tries: 1 latency average = 3.282 ms (including failures) tps = 548.437196 (including connections establishing) tps = 637.662753 (excluding connections establishing) statement latencies in milliseconds and failures: 1.566 2 DO $$ starting vacuum...ERROR: relation "pgbench_branches" does not exist (ignoring this error and continuing anyway) ERROR: relation "pgbench_tellers" does not exist (ignoring this error and continuing anyway) ERROR: relation "pgbench_history" does not exist (ignoring this error and continuing anyway) end. transaction type: pgbench_concurrent_error.sql scaling factor: 1 query mode: simple number of clients: 2 number of threads: 1 number of transactions per client: 10 number of transactions actually processed: 20/20 maximum number of tries: 1 latency average = 2.760 ms tps = 724.746078 (including connections establishing) tps = 853.131985 (excluding connections establishing) statement latencies in milliseconds and failures: 1.893 0 DO $$ Or if you try to create a function and perhaps replace an existing one: starting vacuum...end. client 0 got an error in command 0 (SQL) of script 0; ERROR: duplicate key value violates unique constraint "pg_proc_proname_args_nsp_index" DETAIL: Key (proname, proargtypes, pronamespace)=(my_function, , 2200) already exists. client 0 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 0 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 0 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated transaction type: pgbench_create_function.sql scaling factor: 1 query mode: simple number of clients: 2 number of threads: 1 number of transactions per client: 10 number of transactions actually processed: 10/20 number of failures: 10 (50.000%) number of serialization failures: 0 (0.000%) number of deadlock failures: 0 (0.000%) number of other SQL failures: 10 (50.000%) maximum number of tries: 1 latency average = 82.881 ms (including failures) tps = 12.065492 (including connections establishing) tps = 12.092216 (excluding connections establishing) statement latencies in milliseconds and failures: 82.549 10 CREATE OR REPLACE FUNCTION my_function() RETURNS integer AS 'select 1;' LANGUAGE SQL; >> Why not handle client errors that can occur (but they may also not >> occur) the same way? (For example, always abort the client, or >> conversely do not make aborts in these cases.) Here's an example of >> such error: > >> client 5 got an error in command 1 (SQL) of script 0; ERROR: division >> by zero > > This is an interesting case. For me we must stop the script because > the client is asking for something "stupid", and retrying the same > won't change the outcome, the division will still be by zero. It is > the client responsability not to ask for something stupid, the bench > script is buggy, it should not submit illegal SQL queries. This is > quite different from submitting something legal which happens to fail. > ... >>> I'm not sure that having "--debug" implying this option >>> is useful: As there are two distinct options, the user may be allowed >>> to trigger one or the other as they wish? >> >> I'm not sure that the main debugging output will give a good clue of >> what's happened without full messages about errors, retries and >> failures... > > I'm more argumenting about letting the user decide what they want. > >> These lines are quite long - do you suggest to wrap them this way? > > Sure, if it is too long, then wrap. Ok! >>> Function getTransactionStatus name does not seem to correspond fully >>> to what the function does. There is a passthru case which should be >>> either avoided or clearly commented. >> >> I don't quite understand you - do you mean that in fact this function >> finds out whether we are in a (failed) transaction block or not? Or do >> you mean that the case of PQTRANS_INTRANS is also ok?... > > The former: although the function is named "getTransactionStatus", it > does not really return the "status" of the transaction (aka > PQstatus()?). Thank you, I'll think how to improve it. Perhaps the name checkTransactionStatus will be better... >>> I'd insist in a comment that "cnt" does not include "skipped" >>> transactions >>> (anymore). >> >> If you mean CState.cnt I'm not sure if this is practically useful >> because the code uses only the sum of all client transactions >> including skipped and failed... Maybe we can rename this field to >> nxacts or total_cnt? > > I'm fine with renaming the field if it makes thinks clearer. They are > all counters, so naming them "cnt" or "total_cnt" does not help much. > Maybe "succeeded" or "success" to show what is really counted? Perhaps renaming of StatsData.cnt is better than just adding a comment to this field. But IMO we have the same problem (They are all counters, so naming them "cnt" or "total_cnt" does not help much.) for CState.cnt which cannot be named in the same way because it also includes skipped and failed transactions. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
pgsql-hackers by date: