Thread: BUG #18658: Assert in SerialAdd() due to race condition
The following bug has been logged on the website: Bug reference: 18658 Logged by: Andrew Bille Email address: andrewbille@gmail.com PostgreSQL version: 17.0 Operating system: Ubuntu 20.04 Description: Hello! The following script against a server (master c259b1578) compiled with -DTEST_SUMMARIZE_SERIAL (without the define it's harder to reproduce...): echo "CREATE TABLE t(f1 int);" | psql test for i in seq 100; do echo " SET default_transaction_isolation to 'SERIALIZABLE'; SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g \gexec " | psql -q test & echo " SET default_transaction_isolation to 'SERIALIZABLE'; SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g \gexec " | psql -q test & wait grep 'TRAP:' server.log && break; done fails for me with the following stack trace: #5 0x000055e6aaf3fef5 in ExceptionalCondition (conditionName=conditionName@entry=0x55e6ab1232d1 "TransactionIdIsValid(tailXid)", fileName=fileName@entry=0x55e6ab1231d9 "predicate.c", lineNumber=lineNumber@entry=886) at assert.c:66 #6 0x000055e6aadcde64 in SerialAdd (xid=976743, minConflictCommitSeqNo=18446744073709551615) at predicate.c:886 #7 0x000055e6aadd0877 in SummarizeOldestCommittedSxact () at predicate.c:1521 #8 0x000055e6aadd094f in GetSerializableTransactionSnapshotInt (snapshot=snapshot@entry=0x55e6ab32f3e0 <CurrentSnapshotData>, sourcevxid=sourcevxid@entry=0x0, sourcepid=sourcepid@entry=-1) at predicate.c:1789 #9 0x000055e6aadd2a51 in GetSerializableTransactionSnapshot (snapshot=snapshot@entry=0x55e6ab32f3e0 <CurrentSnapshotData>) at predicate.c:1693 #10 0x000055e6aaf868c4 in GetTransactionSnapshot () at snapmgr.c:257 #11 0x000055e6aade2d9a in exec_simple_query (query_string=query_string@entry=0x55e6ab433fe0 "SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g\n ") at postgres.c:1173 #12 0x000055e6aade516f in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4793 #13 0x000055e6aadde301 in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at backend_startup.c:107 #14 0x000055e6aad2cbf8 in postmaster_child_launch (child_type=child_type@entry=B_BACKEND, startup_data=startup_data@entry=0x7ffc1ca820e4 "", startup_data_len=startup_data_len@entry=4, client_sock=client_sock@entry=0x7ffc1ca82120) at launch_backend.c:274 #15 0x000055e6aad30e72 in BackendStartup (client_sock=client_sock@entry=0x7ffc1ca82120) at postmaster.c:3420 #16 0x000055e6aad31092 in ServerLoop () at postmaster.c:1653 #17 0x000055e6aad324e6 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x55e6ab42e390) at postmaster.c:1351 #18 0x000055e6aac48bd7 in main (argc=3, argv=0x55e6ab42e390) at main.c:197 This is also reproduced on dafaa3efb Implement genuine serializable isolation level. ... Thanks, Andrew
Hello again!
Using a build with no asserts enabled and without that define, I can also see the following errors using the script with increased concurrency (Initially we got a report from our client, who encountered these errors in production):
2024-10-17 04:58:29.774 UTC|test|regression|67109975.ed069|PANIC: could not access status of transaction 18204
2024-10-17 04:58:29.774 UTC|test|regression|67109975.ed069|DETAIL: Could not open file "pg_serial/0000": No such file or directory.
I changed ERROR to PANIC to get a core dump.
echo "CREATE TABLE t(f1 int);" | psql
res=0
for i in `seq 200`; do
echo "I $i"
for l in `seq 80`; do
echo "
SET default_transaction_isolation to 'SERIALIZABLE';
SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g
\gexec
" | psql -q &
done
wait
grep 'PANIC: could not access status' server.log && { res=1; break; }
grep 'TRAP: ' server.log && { res=1; break; }
done
#5 0x00005f824729a10a in errfinish (filename=0x5f824732df83 "slru.c", lineno=1060, funcname=0x5f824732e220 <__func__.3> "SlruReportIOError") at elog.c:599
#6 0x00005f8246c16c26 in SlruReportIOError (ctl=0x5f8247686c80 <SerialSlruCtlData>, pageno=17, xid=18204) at slru.c:1060
#7 0x00005f8246c15cea in SimpleLruReadPage (ctl=0x5f8247686c80 <SerialSlruCtlData>, pageno=17, write_ok=true, xid=18204) at slru.c:579
#8 0x00005f82470988c2 in SerialAdd (xid=18204, minConflictCommitSeqNo=18446744073709551615) at predicate.c:928
#9 0x00005f8247099854 in SummarizeOldestCommittedSxact () at predicate.c:1520
#10 0x00005f8247099ddd in GetSerializableTransactionSnapshotInt (snapshot=0x5f8247691740 <CurrentSnapshotData>, sourcevxid=0x0, sourcepid=-1) at predicate.c:1798
#11 0x00005f8247099c72 in GetSerializableTransactionSnapshot (snapshot=0x5f8247691740 <CurrentSnapshotData>) at predicate.c:1692
#12 0x00005f82472ee3b1 in GetTransactionSnapshot () at snapmgr.c:257
#13 0x00005f82470adfe8 in exec_simple_query (query_string=0x5f8248e06a28 "INSERT INTO t VALUES(42);") at postgres.c:1173
#14 0x00005f82470b3453 in PostgresMain (dbname=0x5f8248e030f0 "regression", username=0x5f8248e3fe08 "test") at postgres.c:4793
#15 0x00005f82470aa5ac in BackendMain (startup_data=0x7fffdf93a054 "", startup_data_len=4) at backend_startup.c:107
#16 0x00005f8246fd6a6a in postmaster_child_launch (child_type=B_BACKEND, startup_data=0x7fffdf93a054 "", startup_data_len=4, client_sock=0x7fffdf93a0a0) at launch_backend.c:274
#17 0x00005f8246fdc4ed in BackendStartup (client_sock=0x7fffdf93a0a0) at postmaster.c:3420
#18 0x00005f8246fd9dd0 in ServerLoop () at postmaster.c:1653
#19 0x00005f8246fd96d0 in PostmasterMain (argc=1, argv=0x5f8248e01090) at postmaster.c:1351
#20 0x00005f8246e9729e in main (argc=1, argv=0x5f8248e01090) at main.c:197
Assert would not be triggered in this case, because of:
(gdb) fr 8
#8 0x00005f82470988c2 in SerialAdd (xid=18204, minConflictCommitSeqNo=18446744073709551615) at predicate.c:928
928 slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid);
(gdb) p *serialControl
$1 = {headPage = 18, headXid = 18654, tailXid = 18557}
I'm not sure if this is the same problem.
Thanks to Alexandr Lakhin for help with reproducing the issue!
Using a build with no asserts enabled and without that define, I can also see the following errors using the script with increased concurrency (Initially we got a report from our client, who encountered these errors in production):
2024-10-17 04:58:29.774 UTC|test|regression|67109975.ed069|PANIC: could not access status of transaction 18204
2024-10-17 04:58:29.774 UTC|test|regression|67109975.ed069|DETAIL: Could not open file "pg_serial/0000": No such file or directory.
I changed ERROR to PANIC to get a core dump.
echo "CREATE TABLE t(f1 int);" | psql
res=0
for i in `seq 200`; do
echo "I $i"
for l in `seq 80`; do
echo "
SET default_transaction_isolation to 'SERIALIZABLE';
SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g
\gexec
" | psql -q &
done
wait
grep 'PANIC: could not access status' server.log && { res=1; break; }
grep 'TRAP: ' server.log && { res=1; break; }
done
#5 0x00005f824729a10a in errfinish (filename=0x5f824732df83 "slru.c", lineno=1060, funcname=0x5f824732e220 <__func__.3> "SlruReportIOError") at elog.c:599
#6 0x00005f8246c16c26 in SlruReportIOError (ctl=0x5f8247686c80 <SerialSlruCtlData>, pageno=17, xid=18204) at slru.c:1060
#7 0x00005f8246c15cea in SimpleLruReadPage (ctl=0x5f8247686c80 <SerialSlruCtlData>, pageno=17, write_ok=true, xid=18204) at slru.c:579
#8 0x00005f82470988c2 in SerialAdd (xid=18204, minConflictCommitSeqNo=18446744073709551615) at predicate.c:928
#9 0x00005f8247099854 in SummarizeOldestCommittedSxact () at predicate.c:1520
#10 0x00005f8247099ddd in GetSerializableTransactionSnapshotInt (snapshot=0x5f8247691740 <CurrentSnapshotData>, sourcevxid=0x0, sourcepid=-1) at predicate.c:1798
#11 0x00005f8247099c72 in GetSerializableTransactionSnapshot (snapshot=0x5f8247691740 <CurrentSnapshotData>) at predicate.c:1692
#12 0x00005f82472ee3b1 in GetTransactionSnapshot () at snapmgr.c:257
#13 0x00005f82470adfe8 in exec_simple_query (query_string=0x5f8248e06a28 "INSERT INTO t VALUES(42);") at postgres.c:1173
#14 0x00005f82470b3453 in PostgresMain (dbname=0x5f8248e030f0 "regression", username=0x5f8248e3fe08 "test") at postgres.c:4793
#15 0x00005f82470aa5ac in BackendMain (startup_data=0x7fffdf93a054 "", startup_data_len=4) at backend_startup.c:107
#16 0x00005f8246fd6a6a in postmaster_child_launch (child_type=B_BACKEND, startup_data=0x7fffdf93a054 "", startup_data_len=4, client_sock=0x7fffdf93a0a0) at launch_backend.c:274
#17 0x00005f8246fdc4ed in BackendStartup (client_sock=0x7fffdf93a0a0) at postmaster.c:3420
#18 0x00005f8246fd9dd0 in ServerLoop () at postmaster.c:1653
#19 0x00005f8246fd96d0 in PostmasterMain (argc=1, argv=0x5f8248e01090) at postmaster.c:1351
#20 0x00005f8246e9729e in main (argc=1, argv=0x5f8248e01090) at main.c:197
Assert would not be triggered in this case, because of:
(gdb) fr 8
#8 0x00005f82470988c2 in SerialAdd (xid=18204, minConflictCommitSeqNo=18446744073709551615) at predicate.c:928
928 slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid);
(gdb) p *serialControl
$1 = {headPage = 18, headXid = 18654, tailXid = 18557}
I'm not sure if this is the same problem.
Thanks to Alexandr Lakhin for help with reproducing the issue!
On Wed, Oct 16, 2024 at 10:33 PM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 18658
Logged by: Andrew Bille
Email address: andrewbille@gmail.com
PostgreSQL version: 17.0
Operating system: Ubuntu 20.04
Description:
Hello!
The following script against a server (master c259b1578) compiled with
-DTEST_SUMMARIZE_SERIAL (without the define it's harder to reproduce...):
echo "CREATE TABLE t(f1 int);" | psql test
for i in seq 100; do
echo "
SET default_transaction_isolation to 'SERIALIZABLE';
SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g
\gexec
" | psql -q test &
echo "
SET default_transaction_isolation to 'SERIALIZABLE';
SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g
\gexec
" | psql -q test &
wait
grep 'TRAP:' server.log && break;
done
fails for me with the following stack trace:
#5 0x000055e6aaf3fef5 in ExceptionalCondition
(conditionName=conditionName@entry=0x55e6ab1232d1
"TransactionIdIsValid(tailXid)", fileName=fileName@entry=0x55e6ab1231d9
"predicate.c", lineNumber=lineNumber@entry=886) at assert.c:66
#6 0x000055e6aadcde64 in SerialAdd (xid=976743,
minConflictCommitSeqNo=18446744073709551615) at predicate.c:886
#7 0x000055e6aadd0877 in SummarizeOldestCommittedSxact () at
predicate.c:1521
#8 0x000055e6aadd094f in GetSerializableTransactionSnapshotInt
(snapshot=snapshot@entry=0x55e6ab32f3e0 <CurrentSnapshotData>,
sourcevxid=sourcevxid@entry=0x0, sourcepid=sourcepid@entry=-1) at
predicate.c:1789
#9 0x000055e6aadd2a51 in GetSerializableTransactionSnapshot
(snapshot=snapshot@entry=0x55e6ab32f3e0 <CurrentSnapshotData>) at
predicate.c:1693
#10 0x000055e6aaf868c4 in GetTransactionSnapshot () at snapmgr.c:257
#11 0x000055e6aade2d9a in exec_simple_query
(query_string=query_string@entry=0x55e6ab433fe0 "SELECT 'INSERT INTO t
VALUES(42);' FROM generate_series(1, 100) g\n ") at postgres.c:1173
#12 0x000055e6aade516f in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at postgres.c:4793
#13 0x000055e6aadde301 in BackendMain (startup_data=<optimized out>,
startup_data_len=<optimized out>) at backend_startup.c:107
#14 0x000055e6aad2cbf8 in postmaster_child_launch
(child_type=child_type@entry=B_BACKEND,
startup_data=startup_data@entry=0x7ffc1ca820e4 "",
startup_data_len=startup_data_len@entry=4,
client_sock=client_sock@entry=0x7ffc1ca82120)
at launch_backend.c:274
#15 0x000055e6aad30e72 in BackendStartup
(client_sock=client_sock@entry=0x7ffc1ca82120) at postmaster.c:3420
#16 0x000055e6aad31092 in ServerLoop () at postmaster.c:1653
#17 0x000055e6aad324e6 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x55e6ab42e390) at postmaster.c:1351
#18 0x000055e6aac48bd7 in main (argc=3, argv=0x55e6ab42e390) at main.c:197
This is also reproduced on dafaa3efb
Implement genuine serializable isolation level.
...
Thanks, Andrew
On 19/10/2024 12:00, Alexander Lakhin wrote: > Your proposed fix (adjusted for REL_16_STABLE) eliminates the issue for me. > Thank you! Committed and backpatched to all supported branches. Thanks for the report and the testing! -- Heikki Linnakangas Neon (https://neon.tech)