Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: POC: Parallel processing of indexes in autovacuum |
Date | |
Msg-id | CAD21AoAxTkpkLtJDgrH9dXg_h+yzOZpOZj3B-4FjW1Mr4qEdbQ@mail.gmail.com Whole thread Raw |
In response to | POC: Parallel processing of indexes in autovacuum (Maxim Orlov <orlovmg@gmail.com>) |
List | pgsql-hackers |
On Thu, May 22, 2025 at 10:48 AM Sami Imseih <samimseih@gmail.com> wrote: > > I started looking at the patch but I have some high level thoughts I would > like to share before looking further. > > > > I find that the name "autovacuum_reserved_workers_num" is generic. It > > > would be better to have a more specific name for parallel vacuum such > > > as autovacuum_max_parallel_workers. This parameter is related to > > > neither autovacuum_worker_slots nor autovacuum_max_workers, which > > > seems fine to me. Also, max_parallel_maintenance_workers doesn't > > > affect this parameter. > > > ....... > > > I've also considered some alternative names. If we were to use > > > parallel_maintenance_workers, it sounds like it controls the parallel > > > degree for all operations using max_parallel_maintenance_workers, > > > including CREATE INDEX. Similarly, vacuum_parallel_workers could be > > > interpreted as affecting both autovacuum and manual VACUUM commands, > > > suggesting that when users run "VACUUM (PARALLEL) t", the system would > > > use their specified value for the parallel degree. I prefer > > > autovacuum_parallel_workers or vacuum_parallel_workers. > > > > > > > This was my headache when I created names for variables. Autovacuum > > initially implies parallelism, because we have several parallel a/v > > workers. So I think that parameter like > > `autovacuum_max_parallel_workers` will confuse somebody. > > If we want to have a more specific name, I would prefer > > `max_parallel_index_autovacuum_workers`. > > I don't think we should have a separate pool of parallel workers for those > that are used to support parallel autovacuum. At the end of the day, these > are parallel workers and they should be capped by max_parallel_workers. I think > it will be confusing if we claim these are parallel workers, but they > are coming from > a different pool. I agree that parallel vacuum workers used during autovacuum should be capped by the max_parallel_workers. > > I envision we have another GUC such as "max_parallel_autovacuum_workers" > (which I think is a better name) that matches the behavior of > "max_parallel_maintenance_worker". Meaning that the autovacuum workers > still maintain their existing behavior ( launching a worker per table > ), and if they do need > to vacuum in parallel, they can draw from a pool of parallel workers. > > With the above said, I therefore think the reloption should actually be a number > of parallel workers rather than a boolean. Let's take an example of a > user that has 3 tables > they wish to (auto)vacuum can process in parallel, and if available > they wish each of these tables > could be autovacuumed with 4 parallel workers. However, as to not > overload the system, they > cap the 'max_parallel_maintenance_worker' to something like 8. If it > so happens that all > 3 tables are auto-vacuumed at the same time, there may not be enough > parallel workers, > so one table will be a loser and be vacuumed in serial. +1 for the reloption having a number of parallel workers, leaving aside the name competition. > That is > acceptable, and a/v logging > ( and perhaps other stat views ) should display this behavior: workers > planned vs workers launched. Agreed. The workers planned vs. launched is reported only with VERBOSE option so we need to change it so that autovacuum can log it at least. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: