Re: another autovacuum scheduling thread - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: another autovacuum scheduling thread
Date
Msg-id aOlC4aDoQcgW8ZpC@nathan
Whole thread Raw
In response to Re: another autovacuum scheduling thread  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: another autovacuum scheduling thread
List pgsql-hackers
On Thu, Oct 09, 2025 at 11:13:48AM -0500, Nathan Bossart wrote:
> On Thu, Oct 09, 2025 at 04:13:23PM +1300, David Rowley wrote:
>> I think the best way to understand it is if you look at
>> relation_needs_vacanalyze() and see how it calculates boolean values
>> for boolean output params. So, instead of calculating just a boolean
>> value it instead calculates a float4 where < 1.0 means don't do the
>> operation and anything >= 1.0 means do the operation. For example,
>> let's say a table has 600 dead rows and the scale factor and threshold
>> settings mean that autovacuum will trigger at 200 (3 times more dead
>> tuples than the trigger point). That would result in the value of 3.0
>> (600 / 200).  The priority for relfrozenxid portion is basically
>> age(relfrozenxid) / autovacuum_freeze_max_age (plus need to account
>> for mxid by doing the same for that and taking the maximum of each
>> value).  For each of those component "scores", the priority for
>> autovacuum would be the maximum of each of those.
>> 
>> Effectively, it's a method of aligning the different units of measure,
>> transactions or tuples into a single value which is calculated based
>> on the very same values that we use today to trigger autovacuums.
> 
> I like the idea of a "score" approach, but I'm worried that we'll never
> come to an agreement on the formula to use.  Perhaps we'd have more luck
> getting consensus on a multifaceted strategy if we kept it brutally simple.
> IMHO it's worth a try...

Here's a prototype of a "score" approach.  Two notes:

* I've given special priority to anti-wraparound vacuums.  I think this is
important to avoid focusing too much on bloat when wraparound is imminent.
In any case, we need a separate wraparound score in case autovacuum is
disabled.

* I didn't include the analyze threshold in the score because it doesn't
apply to TOAST tables, and therefore would artificially lower their
prioritiy.  Perhaps there is another way to deal with this.

This is very much just a prototype of the basic idea.  As-is, I think it'll
favor processing tables with lots of bloat unless we're in an
anti-wraparound scenario.  Maybe that's okay.  I'm not sure how scientific
we want to be about all of this, but I do intend to try some long-running
tests.

-- 
nathan

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Should we update the random_page_cost default value?
Next
From: Robert Haas
Date:
Subject: Re: pg_waldump: support decoding of WAL inside tarfile