Re: Defining (and possibly skipping) useless VACUUM operations - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Defining (and possibly skipping) useless VACUUM operations |
Date | |
Msg-id | CAH2-Wzm1OznBMe=TNibzdD1MQWRM1pBb_KMrXYkRhvQu+dPErA@mail.gmail.com Whole thread Raw |
In response to | Re: Defining (and possibly skipping) useless VACUUM operations (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Defining (and possibly skipping) useless VACUUM operations
|
List | pgsql-hackers |
On Tue, Dec 14, 2021 at 6:05 AM Robert Haas <robertmhaas@gmail.com> wrote: > I think this is a reasonable line of thinking, but I think it's a > little imprecise. In general, we could be vacuuming a relation to > advance relfrozenxid, but we could also be vacuuming a relation to > advance relminmxid, or we could be vacuuming a relation to fight > bloat, or set pages all-visible. It is possible that there's no hope > of advancing relfrozenxid but that we can still accomplish one of the > other goals. In that case, the vacuuming is not useless. I think the > place to put logic around this would be in the triggering logic for > autovacuum. If we're going to force a relation to be vacuumed because > of (M)XID wraparound danger, we could first check whether there seems > to be any hope of advancing relfrozenxid(minmxid). If not, we discount > that as a trigger for vacuum, but may still decide to vacuum if some > other trigger warrants it. In most cases, if there's no hope of > advancing relfrozenxid, there won't be any bloat to remove either, but > aborted transactions are a counterexample. And the XID and MXID > horizons can advance at completely different rates. I think that you'd agree that the arguments in favor of skipping are strongest for an aggressive anti-wraparound autovacuum (as opposed to any other kind of aggressive VACUUM, including aggressive autovacuum). Aside from the big benefit I pointed out already (avoiding blocking useful anti-wraparound vacuums that starts a little later by not starting a conflicting useless anti-wraparound vacuum now), there is also more certainty about downsides. We can know the following things for sure: * We only launch an (aggressive) anti-wraparound autovacuum because we need to advance relfrozenxid. In other words, if we didn't need to advance relfrozenxid then (for better or worse) we definitely wouldn't be launching anything. * Our would-be OldestXmin exactly matches the preexisting pg_class.relfrozenxid (and pg_class.relminmxid). And so it follows that we're definitely not going to be able to do the thing that is ostensibly the whole point of anti-wraparound vacuum (advance relfrozenxid/relminmxid). > One reason I haven't pursued this kind of optimization is that it > doesn't really feel like it's fixing the whole problem. It would be a > little bit sad if we did a perfect job preventing useless vacuuming > but still allowed almost-useless vacuuming. Suppose we have a 1TB > relation and we trigger autovacuum. It cleans up a few things but > relfrozenxid is still old. On the next pass, we see that the > system-wide xmin has not advanced, so we don't trigger autovacuum > again. Then on the pass after that we see that the system-wide xmin > has advanced by 1. Shall we trigger an autovacuum of the whole > relation now, to be able to do relfrozenxid++? Seems dubious. I can see what you mean, but just fixing the most extreme case can be a useful goal. It's often enough to stop the system from going into a tailspin, which is the real underlying goal here. Things that approach the most extreme case (but don't quite hit it) don't have that quality. An anti-wraparound vacuum is supposed to be a mechanism that the system escalates to when nothing else triggers an autovacuum worker to run (which is aggressive but not anti-wraparound). That's not really true in practice, of course; anti-wraparound av often becomes a routine thing. But I think that it's a good ideal to strive for -- it should be rare. The draft patch series now adds opportunistic freezing -- I should be able to post a new version in a few days time, once I've tied up some loose ends. My testing shows an interesting effect, when opportunistic freezing is applied on top of the relfrozenxid thing: every autovacuum manages to advance relfrozenxid, and so we'll never have to run an aggressive autovacuum (much less an aggressive anti-wraparound autovacuum) in practice. And so (for example) when autovacuum runs against the pgbench_history table, it always sets its relfrozenxid to a value very close to the OldestXmin -- usually the exact OldestXmin. Opportunistic freezing makes us avoid setting the all-visible bit for a heap page without also setting the all-frozen bit -- when we're about to do that, we go freeze the heap tuples and then set the entire page all-frozen (so we freeze anything <= OldestXmin, not <= FreezeLimit). We also freeze based on this more aggressive <= OldestXmin cutoff when pruning had to delete some tuples. The patch still needs more polishing, but I think that we can make anti-wraparound vacuums truly exceptional with this design -- which would make autovacuum a lot easier to deal with operationally. This seems like a feasible goal for Postgres 15, even (though still quite ambitious). The opportunistic freezing stuff isn't free (the WAL records aren't tiny), but it's still not all that expensive. Plus I think that the cost can be further reduced, with a little more work. > Part of the problem here, for both vacuuming-for-bloat and > vacuuming-for-relfrozenxid-advancement, we would really like to know > the distribution of old XIDs in the table. What I see with the draft patch series is that the oldest XID just isn't that old anymore, consistently -- we literally never fail to advance relfrozenxid, in any autovacuum, for any table. And the value that we end up with is consistently quite recent. This is something that I see both with BenchmarkSQL, and pgbench. There is a kind of virtuous circle, which prevents us from ever getting anywhere near having any table age in the tens of millions of XIDs. I guess that that makes avoiding useless vacuuming seem like less of a priority. ISTM that it should be something that is squarely aimed at keeping things stable in truly pathological cases. > So I'm not certain of the way forward here. Just because we can't > prevent almost-useless vacuuming is not a sufficient reason to > continue allowing entirely-useless vacuuming that we can prevent. And > it seems like we need a bunch of new bookkeeping to do any better than > that, which seems like a lot of work. So maybe it's the most practical > path forward for the time being, but it feels like more of a > special-purpose kludge than a truly high-quality solution. I'm sure that either one of us will be able to poke holes in any definition of "useless" that is continuous (rather than discrete) -- which, on reflection, pretty much means any definition that is concerned with bloat. I think that you're right about that: the question there must be "why are we even launching these bloat-orientated autovacuums that actually find no bloat?". -- Peter Geoghegan
pgsql-hackers by date: