Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed - Mailing list pgsql-bugs
From | r.zharkov@postgrespro.ru |
---|---|
Subject | Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed |
Date | |
Msg-id | b5d89910ec01a657530d1a49ff0e05dc@postgrespro.ru Whole thread Raw |
In response to | Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed (Andres Freund <andres@anarazel.de>) |
Responses |
Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed
Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed |
List | pgsql-bugs |
On 2019-04-02 23:13, Andres Freund wrote: > Hi, > > On 2019-04-02 12:51:08 -0300, Alvaro Herrera wrote: >> On 2019-Apr-02, Tom Lane wrote: >> >> > Andres Freund <andres@anarazel.de> writes: >> > >> 2019-04-01 15:27:38.829 +07 [7524] STATEMENT: UPDATE pgbench_accounts SET >> > >> abalance = 1 WHERE aid = 1; >> > >> 2019-04-01 15:27:38.829 +07 [7524] PANIC: cannot abort transaction >> > >> 400048276, it was already committed >> > >> > > But that's probably a separate issue. >> > >> > What that seems to indicate is that the "unexpected table_lock_tuple >> > status" error was thrown during commit, which seems pretty odd. >> >> AFAICS this error can only come from ExecDelete(), because the value 1 >> is TM_Invisible and the other callsites where the "unexpected >> table_lock_tuple" error appears use different wording for that one. > > Hm? Why couldn't it be the ExecUpdate() case? > > >> Maybe it's the result of a deferred constraint being checked at that >> time ... maybe it's trying to honor an "on cascade delete" setting for >> an FK, and the affected tuple has already been updated or deleted? > > Then it ought to get TM_Deleted, no? We ought to wait till that > transaction commits, and then roll back. So there's something odd > happening here. I suspect there has to be some additional log output > or > such to explain this. > > Greetings, > > Andres Freund I made the new core file: [New LWP 5967] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `postgres: zharkov test 127.0.0.1(35888) UPDATE '. Program terminated with signal SIGABRT, Aborted. #0 0x00007fe194769428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x00007fe194769428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007fe19476b02a in __GI_abort () at abort.c:89 #2 0x0000000000a9c0ac in errfinish (dummy=0) at elog.c:552 #3 0x0000000000a9e653 in elog_finish (elevel=22, fmt=0xb695a0 "cannot abort transaction %u, it was already committed") at elog.c:1365 #4 0x0000000000562c87 in RecordTransactionAbort (isSubXact=false) at xact.c:1644 #5 0x0000000000563c93 in AbortTransaction () at xact.c:2670 #6 0x0000000000564516 in AbortCurrentTransaction () at xact.c:3191 #7 0x00000000009145b9 in PostgresMain (argc=1, argv=0x255d970, dbname=0x255d950 "test", username=0x252dd88 "zharkov") at postgres.c:4033 #8 0x000000000086406c in BackendRun (port=0x2556020) at postmaster.c:4399 #9 0x00000000008637cd in BackendStartup (port=0x2556020) at postmaster.c:4090 #10 0x000000000085facf in ServerLoop () at postmaster.c:1703 #11 0x000000000085f2dd in PostmasterMain (argc=3, argv=0x252bcf0) at postmaster.c:1376 #12 0x000000000077aeba in main (argc=3, argv=0x252bcf0) at main.c:228 Log: 2019-04-02 21:51:19.639 +07 [5967] STATEMENT: UPDATE pgbench_accounts SET abalance = 1 WHERE aid = 1; 2019-04-02 21:51:19.639 +07 [5967] PANIC: cannot abort transaction 200011850, it was already committed 2019-04-02 21:53:57.924 +07 [2473] LOG: server process (PID 5967) was terminated by signal 6: Aborted 2019-04-02 21:53:57.924 +07 [2473] DETAIL: Failed process was running: UPDATE pgbench_accounts SET abalance = 1 WHERE aid = 1; 2019-04-02 21:53:57.924 +07 [2473] LOG: terminating any other active server processes We run pgbench with two custom scripts. First: \set i random_zipfian(1, 100000 * :scale, 1.01) SELECT abalance FROM pgbench_accounts WHERE aid = :i; Second: \set i random_zipfian(1, 100000 * :scale, 1.01) UPDATE pgbench_accounts SET abalance = 1 WHERE aid = :i; pgbench connects to the database through the connection pooller called Odyssey. We use pool of 16 or 32 backends. We run four series of pgbench tests with different connections count ( 100, 150, 200, ... , 1000 ) Three series with pool size 16 runs normally. But fourth series fails with 200-300 connections. It took about five hours to reproduce the error this time. Here is the little piece of the pgbench log: Bench start Tue Apr 2 14:43:12 UTC 2019 Timestamp 1554216192 transaction type: multiple scripts scaling factor: 1 query mode: simple number of clients: 200 number of threads: 32 duration: 300 s number of transactions actually processed: 8062981 latency average = 7.442 ms tps = 26875.375100 (including connections establishing) tps = 26875.694564 (excluding connections establishing) SQL script 1: ./sql/ycsb_read_zipf.sql - weight: 1 (targets 50.0% of total) - 4019635 transactions (49.9% of total, tps = 13398.171023) - latency average = 6.955 ms - latency stddev = 8.366 ms SQL script 2: ./sql/ycsb_update_zipf.sql - weight: 1 (targets 50.0% of total) - 4021508 transactions (49.9% of total, tps = 13404.414071) - latency average = 7.909 ms - latency stddev = 8.478 ms Bench end Tue Apr 2 14:48:12 UTC 2019 Timestamp 1554216492 Bench start Tue Apr 2 14:48:12 UTC 2019 Timestamp 1554216492 client 220 script 1 aborted in command 1 query 0: PANIC: cannot abort transaction 200011850, it was already committed WARNING: terminating connection because of crash of another server process -- regards, Roman Zharkov
pgsql-bugs by date: