Re: another autovacuum scheduling thread - Mailing list pgsql-hackers

From Jeremy Schneider
Subject Re: another autovacuum scheduling thread
Date
Msg-id 20251010145959.414a2c27@ardentperf.com
Whole thread Raw
In response to Re: another autovacuum scheduling thread  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, 10 Oct 2025 16:24:51 -0400
Robert Haas <robertmhaas@gmail.com> wrote:

> I don't think we
> need something dramatically awesome to make a change to the status
> quo, but if it's extremely easy to think up simple scenarios in which
> a given idea will fail spectacularly, I'd be inclined to suspect that
> there will be a lot of real-world spectacular failures.

What does a real-world spectacular failure look like?

"If those 3 autovac workers had processed tables in a different order
everything would have been peachy"

But if autovac is going to get jammed up long enough to wraparound the
system, does it matter whether or not it did a one-time processing of a
bunch of small tables before it got jammed?

One particular table always scoring high shouldn't block autovac from
other tables, because it doesn't start a new iteration until it goes all
the way through the list from its current iteration right? And one
iteration of autovac needs to process everything in the list... so it
should take the same overall time regardless of order?

The spectacular failures I've seen with autovac usually come down to
things like too much sleeping (cost_delay) or too few workers, where
better ordering would be nice but probably wouldn't fix any real
problems leading to the spectacular failures

From  Robert's 2024 pgConf.dev talk:
1. slow - forward progress not fast enough
2. stuck - no forward progress
3. spinning - not accomplishing anything
4. skipped - thinks not needed
5. starvation - cant keep up

I don't think any of these are really addressed by simply changing
table order.

From Robert's 2022 email to hackers:
> A few people have proposed scoring systems, which I think is closer
> to the right idea, because our basic goal is to start vacuuming any
> given table soon enough that we finish vacuuming it before some
> catastrophe strikes.
...
> If table A will cause wraparound in 2 hours and take 2 hours to
> vacuum, and table B will cause wraparound in 1 hour and take 10
> minutes to vacuum, table A is more urgent even though the catastrophe
> is further out.

Robert it sounds to me like the main use case you're focused on here
is where basically wraparound is imminent - we are already screwed - and
our very last hope was that a last-ditch autovac can finish just in time

Failsafe and dynamic cost updates were huge advancements. Do we allow
dynamic adjustment to worker count yet?

I hope y'all just pick something and commit it without getting too lost
in the details. I honestly think in the list of improvements around
autovac, this is the lowest priority on my list of hopes and dreams as a
user for wraparound prevention :) because if this ever matters to me for
avoiding wraparound, I was screwed long before we got to this point and
this is not going to fix my underlying problems.

-Jeremy



pgsql-hackers by date:

Previous
From: Mario González Troncoso
Date:
Subject: Re: Parameter name standby_mode
Next
From: Tatsuo Ishii
Date:
Subject: Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options