Thread: Summarizing indexes allowing single-phase VACUUM?

Summarizing indexes allowing single-phase VACUUM?

From
Matthias van de Meent
Date:
Hi,

With PG16, we got Index AM -level indications for summarizing indexes,
improving HOT applicability.

Given that these indexes explicitly never store TIDs and thus don't
really need the cleanup part of vacuum's index cleanup, we could well
decide to not count those indexes to the indexes we need to clean up
in the index scan phase of vacuum, thus allowing tables with only
summarizing indexes to use the single-scan option of vacuum.

As a summarizing index may still want to do some housekeeping during
vacuum, it'd still need to get its cleanup functions called, but it'd
at least save the io and WAL of the cleanup scan.

Any comments/suggestions? POC patch attached.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Attachment

Re: Summarizing indexes allowing single-phase VACUUM?

From
Masahiko Sawada
Date:
On Wed, Apr 9, 2025 at 2:47 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> Hi,
>
> With PG16, we got Index AM -level indications for summarizing indexes,
> improving HOT applicability.
>
> Given that these indexes explicitly never store TIDs and thus don't
> really need the cleanup part of vacuum's index cleanup, we could well
> decide to not count those indexes to the indexes we need to clean up
> in the index scan phase of vacuum, thus allowing tables with only
> summarizing indexes to use the single-scan option of vacuum.
>
> As a summarizing index may still want to do some housekeeping during
> vacuum, it'd still need to get its cleanup functions called, but it'd
> at least save the io and WAL of the cleanup scan.
>
> Any comments/suggestions? POC patch attached.

I like this idea.

Here are some comments on the PoC patch:

-       if (vacrel->nindexes == 0
+       if (vacrel->indallsummarizing

I'm not a fan of this change as it isn't obvious from the name
indallsummarizing that the table doesn't have any indexes. I think we
can keep nindexes==0 or rename indallsummarizing to something like
use_onepass_strategy if we want to use merge them.

---
@@ -2536,8 +2544,12 @@ lazy_vacuum(LVRelState *vacrel)
        /*
         * We successfully completed a round of index vacuuming.  Do related
         * heap vacuuming now.
+        *
+        * If all valid indexes are summarizing, then the TIDs have already
+        * been reclaimed, requiring us to skip that last phase.
         */
-       lazy_vacuum_heap_rel(vacrel);
+       if (!vacrel->indallsummarizing)
+           lazy_vacuum_heap_rel(vacrel);

IIUC if indallsummarizing is true we should never reach lazy_vacuum().

---
@@ -241,8 +242,9 @@ static void parallel_vacuum_error_callback(void *arg);
  */
 ParallelVacuumState *
 parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
-                    int nrequested_workers, int vac_work_mem,
-                    int elevel, BufferAccessStrategy bstrategy)
+                    bool indallsummarizing, int nrequested_workers,
+                    int vac_work_mem, int elevel,
+                    BufferAccessStrategy bstrategy)
 {
    ParallelVacuumState *pvs;
    ParallelContext *pcxt;
@@ -282,6 +284,7 @@ parallel_vacuum_init(Relation rel, Relation
*indrels, int nindexes,
    pvs = (ParallelVacuumState *) palloc0(sizeof(ParallelVacuumState));
    pvs->indrels = indrels;
    pvs->nindexes = nindexes;
+   pvs->indallsummarizing = nindexes;
    pvs->will_parallel_vacuum = will_parallel_vacuum;
    pvs->bstrategy = bstrategy;
    pvs->heaprel = rel;

We should set pvs->indallsummarizing to indallsummarizing, not
nindexes, but ISTM we don't use pvs->indallsummazing anywhere in
vacuumparallel.c. Do we need to pass indallsummarizing to
parallel_vacuum_init() in the first place?

Regards,


--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com