Thread: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18158 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 16.0 Operating system: Ubuntu 22.04 Description: When running check-world in a loop on a rather slow device, I observed an interesting case. All tests passed successfully, but in the system journal I found a message about postgres crash during the run. I reproduced the issue and found in src/test/recovery/tmp_check/log/ 012_subtransactions_primary.log: TRAP: failed Assert("!pgStatLocal.shmem->is_shutdown"), File: "pgstat.c", Line: 616, PID: 23770 My analysis shows that this Assert failed in a backend (walsender, in my case) when it has pending stats entries on a server shutdown. In absence of such entries pgstat_report_stats() exits before the Assert: /* Don't expend a clock check if nothing to do */ if (dlist_is_empty(&pgStatPending) && !have_iostats && !have_slrustats && !pgstat_have_pending_wal()) { Assert(pending_since == 0); return 0; } /* * There should never be stats to report once stats are shut down. Can't * assert that before the checks above, as there is an unconditional * pgstat_report_stat() call in pgstat_shutdown_hook() - which at least * the process that ran pgstat_before_server_shutdown() will still call. */ Assert(!pgStatLocal.shmem->is_shutdown); With the following modification in pgstat_flush_pending_entries(): +if (nowait && (rand() % 10 == 0)) + did_flush = false; +else +{ did_flush = kind_info->flush_pending_cb(entry_ref, nowait); +} the issue reproduced easily: make -s check -C src/test/recovery/ PROVE_TESTS="t/012_subtransactions.pl" grep TRAP -r src/test/recovery/tmp_check/log # +++ tap check in src/test/recovery +++ t/012_subtransactions.pl .. ok All tests successful. Files=1, Tests=12, 3 wallclock secs ( 0.01 usr 0.00 sys + 0.19 cusr 0.27 csys = 0.47 CPU) Result: PASS src/test/recovery/tmp_check/log/012_subtransactions_primary.log: TRAP: failed Assert("!pgStatLocal.shmem->is_shutdown"), File: "pgstat.c", Line: 616, PID: 2410126 The stack trace from the log: postgres: primary: walsender law [local] streaming 0/3019C78(ExceptionalCondition+0xbb)[0x5616c7b58b89] postgres: primary: walsender law [local] streaming 0/3019C78(pgstat_report_stat+0xef)[0x5616c7991ff1] postgres: primary: walsender law [local] streaming 0/3019C78(+0x6a9e57)[0x5616c7991e57] postgres: primary: walsender law [local] streaming 0/3019C78(shmem_exit+0xa6)[0x5616c792d852] postgres: primary: walsender law [local] streaming 0/3019C78(+0x6456f6)[0x5616c792d6f6] postgres: primary: walsender law [local] streaming 0/3019C78(proc_exit+0x19)[0x5616c792d63a] postgres: primary: walsender law [local] streaming 0/3019C78(+0x607cec)[0x5616c78efcec] postgres: primary: walsender law [local] streaming 0/3019C78(+0x606e47)[0x5616c78eee47] postgres: primary: walsender law [local] streaming 0/3019C78(+0x6043c8)[0x5616c78ec3c8] postgres: primary: walsender law [local] streaming 0/3019C78(exec_replication_command+0x4cc)[0x5616c78edfa5] postgres: primary: walsender law [local] streaming 0/3019C78(PostgresMain+0x7df)[0x5616c796d54b] ...