Thread: BUG #18359: autovacuum stopped vacuuming user tables, focusing on just pg_auth_members repeatedly
BUG #18359: autovacuum stopped vacuuming user tables, focusing on just pg_auth_members repeatedly
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18359 Logged by: Dmitry Astapov Email address: dastapov@gmail.com PostgreSQL version: 13.6 Operating system: Rocky Linux Description: Sorry, I don't have a way to reproduce the problem at will. I am also reporting this for version 13.6, but I checked src/backend/access/heap/vacuumlazy.c from 13.14, and it is unchanged between those versions, so I think my bugreport will apply to 13.14 as well. Symptoms: - Database users complain that "queries that took seconds now take hours". Investigation quickly leads to the conclusion that all the queries are mis-planned due to the stale statistics. - select max(last_autovacuum), max(last_autoanalyze) from pg_stat_user_tables shows that last autovacuum/autonalyze was more than two days ago, and no autovacuum workers are running, according to pg_stat_activity / pg_stat_progress_vacuum - normally, we can see at least a couple at any point in time. Log_autovacuum_min_duration was set to default -1, so we don't have logs from the beginning of the incident. However, once we determined that autovacuums are not running, we set it to 0, at which point we saw this message repeated in the logs approximately every 10 seconds: 2024-02-20 08:08:49.993 EST,,,758746,,65d4a461.b93da,3,,2024-02-20 08:08:49 EST,331/9765639,0,LOG,00000,"automatic vacuum to prevent wraparound of table ""postgres.pg_catalog.pg_auth_members"": index scans: 0 pages: 0 removed, 100 remain, 0 skipped due to pins, 62 skipped frozen tuples: 0 removed, 18359 remain, 0 are dead but not yet removable, oldest xmin: 1429906812 buffer usage: 11 hits, 0 misses, 0 dirtied avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s WAL usage: 0 records, 0 full page images, 0 bytes",,,,,,,,,"","autovacuum worker" We have about 15 minutes worth of these log lines, but all the numbers (pages, tuples, scans, skipped, frozen, ...) in the message were exactly the same every time (as verified by "sort -u"), and there were no other "autovacuum worker" messages in the log. It was imperative to restore the normal operation of the database, and on a whim, we tried "vacuum full pg_auth_members", which executed immediately. Right after that we saw autovacuum workers scheduled and running as usual, and soon after everything returned back to normal. This old thread https://www.postgresql.org/message-id/flat/CAE39h23X%3D09_PBvyvC3yy1MBVu6pkLuDj9KpaQUzoSmi_2%2BiKw%40mail.gmail.com#900740c5db62e5be2afa51da23a6a926 seems very similar to what we have experienced. I've checked through the changelogs of versions 14, 15, 16, and nothing jumps out at me as a fix for this specific problem.