[HACKERS] Too many autovacuum workers spawned during forced auto-vacuum - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum |
Date | |
Msg-id | CAJ3gD9eWejf72HKquKSzax0r+epS=nAbQKNnykkMA0E8c+rMDg@mail.gmail.com Whole thread Raw |
Responses |
Re: [HACKERS] Too many autovacuum workers spawned during forcedauto-vacuum
|
List | pgsql-hackers |
In a server where autovacuum is disabled and its databases reach autovacuum_freeze_max_age limit, an autovacuum is forced to prevent xid wraparound issues. At this stage, when the server is loaded with a lot of DML operations, an exceedingly high number of autovacuum workers keep on getting spawned, and these do not do anything, and then quit. The issue is : Sometimes an auto-vaccuum worker A1 finds it has no tables to vacuum because of another worker(s) B already concurrently vacuuming the tables. Then this worker A1 calls vac_update_datfrozenxid() in the end, which effectively requests for new auto-vacuum (vac_update_datfrozenxid() -> vac_truncate_clog() -> SetTransactionIdLimit -> Send PMSIGNAL_START_AUTOVAC_LAUNCHER signal). *Immediately* a new auto-vacuum is spawned since it is an xid-wraparound vacuum. This new autovacuum chooses the same earlier database for the new worker A2, because the datfrozenxid's aren't yet updated due to worker B still not done yet. Worker A2 finds that the same worker(s) B are still vacuuming the same tables. A2 again calls vac_update_datfrozenxid(), which leads to spawning another auto-vacuum worker A3 , and the cycle keeps on repeating until B finishes vacuuming and updates datfrozenid. Steps to reproduce : 1. In a fresh cluster, create tables using attached create.sql.gz 2. Insert some data : INSERT into pgbench_history select generate_series(1, 5402107, 1); UPDATE pgbench_history set bid = 1, aid = 11100, delta = 500, mtime = '2017/1/1', filler = '19:00' ; 3. Set these GUCs in postgresql.conf and restart the server : autovacuum = off autovacuum_freeze_max_age = 100000 # Make auto-vacuum start as early as possible. 4. Run : "pgbench -n -c 5 -t 2000000" and watch the log file. 5. After the age(datfrozenxid) of the databases crosses autovacuum_freeze_max_age value, after 1-2 minutes more, a number of messages will be seen, such as this : 2017-01-13 14:50:12.304 IST [111811] LOG: autovacuum launcher started 2017-01-13 14:50:12.346 IST [111816] LOG: autovacuum launcher started 2017-01-13 14:50:12.825 IST [111818] LOG: autovacuum launcher started I see around 70 messages per second. === Fix === For fixing the issue, one approach I thought was to make do_start_worker() choose a different database after determining that the earlier database is still being scanned by one of the workers in AutoVacuumShmem->av_runningWorkers. The current logic is that if the databases need xid-wraparound prevention, then we pick the one with oldest datfrozenxid. So, for the fix, just choose the db with the second oldest datfrozenxid if the oldest one is still being processed. But this does not solve the problem. Once the other databases are done, or if there's a single database, the workers will again go in a loop with the same database. Instead, the attached patch (prevent_useless_vacuums.patch) prevents the repeated cycle by noting that there's no point in doing whatever vac_update_datfrozenxid() does, if we didn't find anything to vacuum and there's already another worker vacuuming the same database. Note that it uses wi_tableoid field to check concurrency. It does not use wi_dboid field to check for already-processing worker, because using this field might cause each of the workers to think that there is some other worker vacuuming, and eventually no one vacuums. We have to be certain that the other worker has already taken a table to vacuum. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: