Home > mailing lists
Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: POC: Parallel processing of indexes in autovacuum
Date	January 17 01:20:14
Msg-id	CAD21AoDz-1Zf9DOJJrdcB2=eNA4UdywthkowNp_dHmOGC-yV_g@mail.gmail.com Whole thread Raw
In response to	Re: POC: Parallel processing of indexes in autovacuum (Daniil Davydov <3danissimo@gmail.com>)
List	pgsql-hackers
Tree view
On Fri, Jan 16, 2026 at 6:11 AM Daniil Davydov <3danissimo@gmail.com> wrote:
>
> Hi,
>
> On Thu, Jan 15, 2026 at 9:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> >
> > ---
> > +   /*
> > +    * Cap the number of free workers by new parameter's value, if needed.
> > +    */
> > +   AutoVacuumShmem->av_freeParallelWorkers =
> > +       Min(AutoVacuumShmem->av_freeParallelWorkers,
> > +           autovacuum_max_parallel_workers);
> > +
> > +   if (autovacuum_max_parallel_workers > prev_max_parallel_workers)
> > +   {
> > +       /*
> > +        * If user wants to increase number of parallel autovacuum workers, we
> > +        * must increase number of free workers.
> > +        */
> > +       AutoVacuumShmem->av_freeParallelWorkers +=
> > +           (autovacuum_max_parallel_workers - prev_max_parallel_workers);
> > +   }
> >
> > Suppose the previous autovacuum_max_parallel_workers is 5 and there
> > are 2 workers are reserved (i.e., there are 3 free parallel workers),
> > if the autovacuum_max_parallel_workers changes to 2, the new
> > AutoVacuumShmem->av_freeParallelWorkers would be 2 based on the above
> > codes, but I believe that the new number of free workers should be 0
> > as there are already 2 workers are running. What do you think? I guess
> > we can calculate the new number of free workers by:
> >
> > Max((autovacuum_max_parallel_workers - prev_max_parallel_workers) +
> > AutoVacuumShmem->av_freeParallelWorkers), 0)
> >
>
> If av_max_parallel_workers was changed to 2, then we not only set
> freeParallelWorkers to 2 but also set maxParallelWorkers to 2.
> Thus, when previously reserved two workers are released, av leader will
> encounter this code:
>
> /*
>  * If the maximum number of parallel workers was reduced during execution,
>  * we must cap available workers number by its new value.
>  */
> AutoVacuumShmem->av_freeParallelWorkers =
>     Min(AutoVacuumShmem->av_freeParallelWorkers + nworkers,
>            AutoVacuumShmem->av_maxParallelWorkers);
>
> I.e. freeParallelWorkers will be left as "2".
>
> The formula you suggested is also correct, but if you have no objections,
> I would prefer not to change the existing logic. It seems more reliable for
> me when av leader explicitly can consider such a situation.

Looking at AutoVacuumReserveParallelWorkers(), it seems that we don't
check the av_maxParallelWorkers() there. Is it possible that two more
workers would be reserved even while the existing 2 workers are
running?

 /* Provide as many workers as we can. */
 *nworkers = Min(AutoVacuumShmem->av_freeParallelWorkers, *nworkers);
 AutoVacuumShmem->av_freeParallelWorkers -= *nworkers;

Some review comments on v19-0001 patch:

+       /* Release all the reserved parallel workers for autovacuum */
+       if (AmAutoVacuumWorkerProcess() && pvs->pcxt->nworkers_launched > 0)
+           AutoVacuumReleaseParallelWorkers(pvs->pcxt->nworkers_launched);

Since we want to release all reserved workers here, I think it's clear
if we use AutoVacuumReleaseAllParallelWorkers() and we add
Assert(av_nworkers_reserved == 0) at the end of
AutoVacuumReleaseAllParallelWorkers(). This way, we can ensure that
all workers are released and it makes the codes more readable. What do
you think?

I've attached the patch proposing this change (please find
v19-0001_masahiko.patch).

---
+#autovacuum_max_parallel_workers = 2    # disabled by default and limited by
+                                        # max_worker_processes

It's odd to me that the comment says it's disabled by default while
being set to 2. I think we can rewrite the comment to:

+#autovacuum_max_parallel_workers = 2    # limited by max_worker_processes

BTW it seems to me that this GUC should be capped by
max_parallel_workers instead of max_worker_processes, no?

---
+  long_desc => 'This parameter is capped by "max_worker_processes"
(not by "autovacuum_max_workers"!).',

I'm concerned that this kind of description might not be appropriate
to the description in long_desc. Looking at long_desc contents of
other GUC parameters, we describe the detail of the parameters (e.g.,
"0 means xxx" or detailed explanation of the effect). Probably we can
remove this line.

>
> > ---
> > I've attached a patch proposing some minor changes.
> >
>
> Thanks! I agree with all fixes except a single one:
> - * NOTE: We will try to provide as many workers as requested, even if caller
> - * will occupy all available workers.
>
> I think that this is a pretty important point. I'll leave this NOTE in the
> v19 patch set. Do you mind?

No, I agree with that.

>
> >
> > +   /*
> > +    * Number of planned and actually launched parallel workers for all index
> > +    * scans, or NULL
> > +    */
> > +   PVWorkersUsage *workers_usage;
> >
> > I think that LVRelState can have PVWorkersUsage instead of a pointer to it.
> >
>
> Previously I used the NULL value of this pointer as a flag that we don't need
> to log workers usage. Now I'll add boolean flag for this purpose (IIUC,
> "nplanned > 0" condition is not enough to determine whether we should log
> workers usage, because VACUUM PARALLEL can be called without VERBOSE).

Can't we simply not report the worker usage if nplanned is 0?

>
> > ---
> > +       /*
> > +        * Allocate space for workers usage statistics. Thus, we explicitly
> > +        * make clear that such statistics must be accumulated. For now, this
> > +        * is used only by autovacuum leader worker, because it must log it in
> > +        * the end of table processing.
> > +        */
> > +       vacrel->workers_usage = AmAutoVacuumWorkerProcess() ?
> > +           (PVWorkersUsage *) palloc0(sizeof(PVWorkersUsage)) :
> > +           NULL;
> >
> > I think we can report the worker statistics even in VACUUM VERBOSE
> > logs. Currently VACUUM VERBOSE reports the worker usage just during
> > index vacuuming but it would make sense to report the overall
> > statistics in vacuum logs. It would help make VACUUM VERBOSE logs and
> > autovacuum logs consistent.
> >
>
> Agree.
>
> > But we don't need to report the worker usage if we didn't use the
> > parallel vacuum (i.e., if npanned == 0).
> >
>
> As I wrote above - we don't need to log workers usage if the VERBOSE option
> is not specified (even if nplanned > 0). Am I missing something?

No. My point is that even when the VERBOSE option is specified, we can
skip reporting the worker usage if the parallel vacuum is not even
planned. That is, I think we can do like:

    if (vacrel->workers_usage.nplanned > 0)
        appendStringInfo(&buf,
                         _("parallel index vacuum/cleanup: %d workers
were planned and %d workers were launched in total\n"),
                         vacrel->workers_usage.nplanned,
                         vacrel->workers_usage.nlaunched);

>
> > ---
> > +       /* Remember these values, if we asked to. */
> > +       if (wusage != NULL)
> > +       {
> > +           wusage->nlaunched += pvs->pcxt->nworkers_launched;
> > +           wusage->nplanned += nworkers;
> > +       }
> >
> > This code runs after the attempt to reserve parallel workers.
> > Consequently, if we fail to reserve any workers due to
> > autovacuum_max_parallel_workers, we report the status as if parallel
> > vacuum wasn't planned at all. I think knowing the number of workers
> > that were planned but not reserved would provide valuable insight for
> > users tuning autovacuum_max_parallel_workers.
> >
>
> 100% agree.

Thank you for updating the patch. I think that we need the explanation
of what nlaunched and nplanned actually mean in the PVWorkersUsage
definition:

+typedef struct PVWorkersUsage
+{
+   int         nlaunched;
+   int         nplanned;
+} PVWorkersUsage;

I'm concerned that readers might be confused that nplanned is not the
number of parallel workers we actually planned to launch.

Or it might make sense to track these three values: planned, reserved,
and launched. For example, suppose max_worker_processes = 10 and
autovacuum_max_parallel_workers = 5, if two autovacuum workers try to
reserve 3 workers each, one worker can reserve and launch 3 and
another worker can reserve and launch 2. The autovacuum logs would be
"3 planned and 3 launched" and "3 planned and 2 launched". Users can
deal with the shortage of parallel workers by increasing
autovacuum_max_parallel_workers. On the other hand, if some bgworkers
are being used by other components (.e.g, parallel queries, logical
replication etc.) and there are only 2 free bgworkers, the autovacuum
worker can reserve 3 but can launch only 2, and other worker can
reserve 2 but cannot launch any workers. The autovacuum logs would be
"3 planned and 2 launched" and "3 planned and 0 launched". Here
increasing autovacuum_max_parallel_workers resolves the shortage of
parallel workers, but users would have to increase
max_worker_processes instead. If we can report the worker usage like
"3 planned, 3 reserved, and 2 launched" and "3 planned, 2 reserved,
and 0 launched", users would realize the need to increase
max_worker_processes. Of course, the "xxx reserved" information would
not be necessary for VACUUM VERBOSE logs.

>
> >
> > +typedef enum AVLeaderFaulureType
> > +{
> > +   FAIL_NONE,
> > +   FAIL_ERROR,
> > +   FAIL_FATAL,
> > +}          AVLeaderFaulureType;
> >
> > I'm concerned that it is somewhat overwrapped with what injection
> > points does as we can set 'error' to injection_points_attach(). For
> > the FATAL error, we can terminate the autovacuum worker by using
> > pg_terminate_backend() that keeps waiting due to
> > injection_point_attach() with action='wait'.
> >
>
> Oh, I didn't know about the possibility of testing FATAL errors with
> pg_terminate_backend. After reading your letter I found this pattern
> in signal_autovacuum.pl. This is beautiful.
> Thank you, I'll rework these tests.

+1

>
> > ---
> > It would be great if we could test the av_freeParallelWorkers
> > adjustment when max_parallel_maintenance_workers changes.
> >
>
> You mean "when autovacuum_max_parallel_workers changes"?
> I'll add a test for it.

Yes, thanks!

>
> > ---
> > +   if (!AmAutoVacuumWorkerProcess())
> > +   {
> > +       /*
> > +        * If we are autovacuum parallel worker, check whether cost-based
> > +        * parameters had changed in leader worker.
> > +        * If so, vacuum_cost_delay and vacuum_cost_limit will be set to the
> > +        * values which leader worker is operating on.
> > +        *
> > +        * Do it before checking VacuumCostActive, because its value might be
> > +        * changed after leader's parameters consumption.
> > +        */
> > +       parallel_vacuum_fix_cost_based_params();
> > +   }
> >
> > We need to add checks to prevent the normal backend running the VACUUM
> > command from calling parallel_vacuum_fix_cost_based_params().
> >
>
> We already have such check inside the "fix_cost_based" function :
> /* Check whether we are running parallel autovacuum */
> if (pv_shared_cost_params == NULL)
>     return false;
>
> We also have this comment:
> * If we are autovacuum parallel worker, check whether cost-based
> * parameters had changed in leader worker.
>
> As an alternative, I'll add comment explicitly saying that process will
> immediately return if it not parallel autovacuum participant.

Why don't we add IsInParallelMode() or IsParallelWorker() check before
calling parallel_vacuum_update_shared_delay_params()?

Some review comments on v19-0003 patch:

+bool
+parallel_vacuum_update_shared_delay_params(void)
+{
+   /* Check whether we are running parallel autovacuum */
+   if (pv_shared_cost_params == NULL)
+       return false;
+
+   Assert(IsParallelWorker() && !AmAutoVacuumWorkerProcess());

These codes are a bit odd to me in two points:

1. A process can never be both a parallel worker and an autovacuum worker.

2. If pv_shared_cost_parame == NULL, even autovacuum workers and
non-parallel workers can call this function, but it seems to be
unexpected function call given the subsequent assertion. If we want to
have an assertion to ensure that a function is called only by
processes we expect or allow, I think we should add an assertion to
the beginning of function. How about rewriting these parts to:

    Assert(IsParallelWorker());

    /* Check whether we are running parallel autovacuum */
    if (pv_shared_cost_params == NULL)
        return false;

---
+        * Note, that this function has no effect if we are non-autovacuum
+        * parallel worker.
+        */

I don't think this kind of comment should be noted here since if we
change the parallel_vacuum_update_shared_delay_params() behavior in
the future, such comments would get easily out-of-sync.

>
> > +   Assert(IsParallelWorker() && !AmAutoVacuumWorkerProcess());
> > +
> > +   SpinLockAcquire(&pv_shared_cost_params->spinlock);
> > +
> > +   vacuum_cost_delay = pv_shared_cost_params->cost_delay;
> > +   vacuum_cost_limit = pv_shared_cost_params->cost_limit;
> > +
> > +   SpinLockRelease(&pv_shared_cost_params->spinlock);
> >
> > IIUC autovacuum parallel workers seems to update their
> > vacuum_cost_{delay|limit} every vacuum_delay_point(), which seems not
> > good. Can we somehow avoid unnecessary updates?
>
> More precisely, parallel worker *reads* leader's parameters every delay_point.
> Obviously, this does not mean that the parameters will necessarily be updated.
>
> But I don't see anything wrong with this logic. We just every time get the most
> relevant parameters from the leader. Of course we can introduce some
> signaling mechanism, but it will have the same effect as in the current code.

Although the parameter propagation itself is working correctly, the
current implementation seems suboptimal performance-wise. Acquiring an
additional spinlock and updating the local variables for every block
seems too costly to me. IIUC we would end up incurring these costs
even when vacuum delays are disabled. I think we need to find a better
way.

For example, we can have a generation of these parameters. That is,
the leader increments the generation (stored in PVSharedCostParams)
whenever updating them after reloading the configuration file, and
workers maintain its generation of the parameters currently used. If
the worker's generation < the global generation, it updates its
parameters along with its generation. I think we can implement the
generation using pg_atomic_u32, making the check for parameter updates
lock-free. There might be better ideas, though.

I'll review the patches for regression tests and the documentation.


Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment

v19-0001_masahiko.patch
pgsql-hackers by date:
From: Michael Paquier
Date: 17 January, 01:12:48
Subject: Re: pg_stat_statements: Fix nested tracking for implicitly closed cursors
From: Tom Lane
Date: 17 January, 01:46:51
Subject: Re: 001_password.pl fails with --without-readline
Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers

Attachment

Previous

Next