Re: Make autovacuum sort tables in descending order of xid_age - Mailing list pgsql-hackers
From | Mark Dilger |
---|---|
Subject | Re: Make autovacuum sort tables in descending order of xid_age |
Date | |
Msg-id | 2ad2a9fa-32eb-29aa-07ee-f0fe75ad4db5@gmail.com Whole thread Raw |
In response to | Re: Make autovacuum sort tables in descending order of xid_age (David Fetter <david@fetter.org>) |
Responses |
Re: Make autovacuum sort tables in descending order of xid_age
Re: Make autovacuum sort tables in descending order of xid_age Re: Make autovacuum sort tables in descending order of xid_age |
List | pgsql-hackers |
On 12/12/19 11:26 AM, David Fetter wrote: > On Thu, Dec 12, 2019 at 08:02:25AM -0800, Mark Dilger wrote: >> On 11/30/19 2:23 PM, David Fetter wrote: >>> On Sat, Nov 30, 2019 at 10:04:07AM -0800, Mark Dilger wrote: >>>> On 11/29/19 2:21 PM, David Fetter wrote: >>>>> On Fri, Nov 29, 2019 at 07:01:39PM +0100, David Fetter wrote: >>>>>> Folks, >>>>>> >>>>>> Per a suggestion Christophe made, please find attached a patch to >>>>>> $Subject: >>>>>> >>>>>> Apart from carefully fudging with pg_resetwal, and short of running in >>>>>> production for a few weeks, what would be some good ways to test this? >>>>> >>>>> Per discussion on IRC with Sehrope Sarkuni, please find attached a >>>>> patch with one fewer bug, this one in the repalloc() calls. >>>> >>>> Hello David, >>>> >>>> Here are my initial thoughts. >>>> >>>> Although you appear to be tackling the problem of vacuuming tables >>>> with older Xids first *per database*, >>> >>> Yes, that's what's come up for me in production, but lately, >>> production has consisted of a single active DB maxing out hardware. I >>> can see how in other situations--multi-tenant, especially--it would >>> make more sense to sort the DBs first. >> >> I notice you don't address that in your latest patch. Do you have >> any thoughts on whether that needs to be handled in this patch? > > My thought is that it doesn't. I can live with that for now. I'd like the design to be compatible with revisiting that in a subsequent patch. >>>> I have not tested this change, but I may do so later today or perhaps >>>> on Monday. >> >> The code compiles cleanly and passes all regression tests, but I don't >> think those tests really cover what you are changing. Have you been >> using any test framework for this? > > I don't have one :/ We need to get that fixed. >> I wonder if you might add information about table size, table changes, >> and bloat to your RelFrozenXidAge struct and modify rfxa_comparator to >> use a heuristic to cost the (age, size, bloat, changed) grouping and >> sort on that cost, such that really large bloated tables with old xids >> might get vacuumed before smaller, less bloated tables that have >> even older xids. Sorting the tables based purely on xid_age seems to >> ignore other factors that are worth considering. I do not have a >> formula for how those four factors should be weighted in the heuristic, >> but you are implicitly assigning three of them a weight of zero in >> your current patch. > > I think it's vastly premature to come up with complex sorting systems > right now. Just sorting in descending order of age should either have > or not have positive effects. I hear what you are saying, but I'm going to argue the other side. Let C = 1.00000002065 Let x = xid_age for a table Let v = clamp(n_dead_tuples / reltuples*2) to max 0.5 Let a = clamp(changes_since_analyze / reltuples) to max 0.5 Let score = C**x + v + a With x = 1 million => C**x = 1.02 x = 200 million => C**x = 62.2 x = 2**32 => C**x = FLT_MAX - delta The maximum contribution to the score that n_dead_tuples and changes_since_analyze can make is 1.0. Once the xid age reaches one million, it will start to be the dominant factor. By the time it reaches the default value of 200 million for freeze_max_age it is far and away the dominant factor, and the xid age of one table vs. another never overflows FLT_MAX given that 2**32 is the largest xid age your current system can store in the uint32 you are using. The computed score is a 32 bit float, which takes no more memory to store than the xid_age field you are storing. So storing the score rather than the xid age is memory-wise equivalent to your patch. I doubt the computation time for the exponential is relevant compared to the n*log(n) average sorting time of the quicksort. It is even less relevant compared to the time it takes to vacuum the tables. I doubt my proposal has a measurable run-time impact. On the upside, if you have a database with autovacuum configured aggressively, you can get the tables with the most need vacuumed first, with need computed relative to vac_scale_factor and anl_scale_factor, which helps for a different use case than yours. The xid age problem might not exist for databases where autovacuum has enough resources to never fall behind. Those databases will have other priorities for where autovacuum spends its time. I'm imagining coming back with two patches later, one that does something more about choosing which database to vacuum first, and another that recomputes which table to vacuum next when a worker finishes vacuuming a table. These combined could help keep tables that are sensitive to statistics changes vacuumed more frequently than others. >> relation_needs_vacanalyze currently checks the reltuples, n_dead_tuples >> and changes_since_analyze along with vac_scale_factor and >> anl_scale_factor for the relation, but only returns booleans dovacuum, >> doanalyze, and wraparound. > > Yeah, I looked at that. It's for a vastly different purpose, namely > deciding what's an emergency and what's probably not, but needs > attention anyhow. My goal was something a little finer-grained and, I > hope, a little easier to establish the (lack of) benefits because only > one thing is getting changed. That's all I'll say for now. Hopefully other members of the community will weigh in. -- Mark Dilger
pgsql-hackers by date: