Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [HACKERS] Block level parallel vacuum |
Date | |
Msg-id | CAA4eK1Jbc_jx725=h+W5-+ToirCBP2hpWG9fAsRMDqG+E9ORcA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Block level parallel vacuum (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [HACKERS] Block level parallel vacuum
Re: [HACKERS] Block level parallel vacuum Re: [HACKERS] Block level parallel vacuum |
List | pgsql-hackers |
On Wed, Dec 18, 2019 at 12:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Dec 18, 2019 at 11:46 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 18 Dec 2019 at 15:03, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I was analyzing your changes related to ReinitializeParallelDSM() and > > > it seems like we might launch more number of workers for the > > > bulkdelete phase. While creating a parallel context, we used the > > > maximum of "workers required for bulkdelete phase" and "workers > > > required for cleanup", but now if the number of workers required in > > > bulkdelete phase is lesser than a cleanup phase(as mentioned by you in > > > one example), then we would launch more workers for bulkdelete phase. > > > > Good catch. Currently when creating a parallel context the number of > > workers passed to CreateParallelContext() is set not only to > > pcxt->nworkers but also pcxt->nworkers_to_launch. We would need to > > specify the number of workers actually to launch after created the > > parallel context or when creating it. Or I think we call > > ReinitializeParallelDSM() even the first time running index vacuum. > > > > How about just having ReinitializeParallelWorkers which can be called > only via vacuum even for the first time before the launch of workers > as of now? > See in the attached what I have in mind. Few other comments: 1. + shared->disable_delay = (params->options & VACOPT_FAST); This should be part of the third patch. 2. +lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, + LVRelStats *vacrelstats, LVParallelState *lps, + int nindexes) { .. .. + /* Cap by the worker we computed at the beginning of parallel lazy vacuum */ + nworkers = Min(nworkers, lps->pcxt->nworkers); .. } This should be Assert. In no case, the computed workers can be more than what we have in context. 3. + if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) || + ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)) + nindexes_parallel_cleanup++; I think the second condition should be VACUUM_OPTION_PARALLEL_COND_CLEANUP. I have fixed the above comments and some given by me earlier [1] in the attached patch. The attached patch is a diff on top of v36-0002-Add-parallel-option-to-VACUUM-command. Few other comments which I have not fixed: 4. + if (Irel[i]->rd_indam->amusemaintenanceworkmem) + nindexes_mwm++; + + /* Skip indexes that don't participate parallel index vacuum */ + if (vacoptions == VACUUM_OPTION_NO_PARALLEL || + RelationGetNumberOfBlocks(Irel[i]) < min_parallel_index_scan_size) + continue; Won't we need to worry about the number of indexes that uses maintenance_work_mem only for indexes that can participate in a parallel vacuum? If so, the above checks need to be reversed. 5. /* + * Remember indexes that can participate parallel index vacuum and use + * it for index statistics initialization on DSM because the index + * size can get bigger during vacuum. + */ + can_parallel_vacuum[i] = true; I am not able to understand the second part of the comment ("because the index size can get bigger during vacuum."). What is its relevance? 6. +/* + * Vacuum or cleanup indexes that can be processed by only the leader process + * because these indexes don't support parallel operation at that phase. + * Therefore this function must be called by the leader process. + */ +static void +vacuum_indexes_leader(Relation *Irel, int nindexes, IndexBulkDeleteResult **stats, + LVRelStats *vacrelstats, LVParallelState *lps) { .. Why you have changed the order of nindexes parameter? I think in the previous patch, it was the last parameter and that seems to be better place for it. Also, I think after the latest modifications, you can remove the second sentence in the above comment ("Therefore this function must be called by the leader process.). 7. + for (i = 0; i < nindexes; i++) + { + bool leader_only = (get_indstats(lps->lvshared, i) == NULL || + skip_parallel_vacuum_index(Irel[i], lps->lvshared)); + + /* Skip the indexes that can be processed by parallel workers */ + if (!leader_only) + continue; It is better to name this parameter as skip_index or something like that. [1] - https://www.postgresql.org/message-id/CAA4eK1%2BKBAt1JS%2BasDd7K9C10OtBiyuUC75y8LR6QVnD2wrsMw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: