Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations |
Date | |
Msg-id | CAH2-WzkN4r0aK4O7HBWT2tX9kamZWYRJe=gxPRWHMJONbuxyOQ@mail.gmail.com Whole thread Raw |
In response to | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
|
List | pgsql-hackers |
On Mon, Dec 20, 2021 at 8:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Can we fully get rid of vacuum_freeze_table_age? > > Does it mean that a vacuum always is an aggressive vacuum? No. Just somewhat more like one. Still no waiting for cleanup locks, though. Also, autovacuum is still cancelable (that's technically from anti-wraparound VACUUM, but you know what I mean). And there shouldn't be a noticeable difference in terms of how many blocks can be skipped using the VM. > If opportunistic freezing works well on all tables, we might no longer > need vacuum_freeze_table_age. But I’m not sure that’s true since the > cost of freezing tuples is not 0. That's true, of course, but right now the only goal of opportunistic freezing is to advance relfrozenxid in every VACUUM. It needs to be shown to be worth it, of course. But let's assume that it is worth it, for a moment (perhaps only because we optimize freezing itself in passing) -- then there is little use for vacuum_freeze_table_age, that I can see. > I think that that's true for (mostly) static tables. But regarding > constantly-updated tables, since autovacuum runs based on the number > of garbage tuples (or inserted tuples) and how old the relfrozenxid is > if an autovacuum could not advance the relfrozenxid because it could > not get a cleanup lock on the page that has the single oldest XID, > it's likely that when autovacuum runs next time it will have to > process other pages too since the page will get dirty enough. I'm not arguing that the age of the single oldest XID is *totally* irrelevant. Just that it's typically much less important than the total amount of work we'd have to do (freezing) to be able to advance relfrozenxid. In any case, the extreme case where we just cannot get a cleanup lock on one particular page with an old XID is probably very rare. > It might be a good idea that we remember pages where we could not get > a cleanup lock somewhere and revisit them after index cleanup. While > revisiting the pages, we don’t prune the page but only freeze tuples. Maybe, but I think that it would make more sense to not use FreezeLimit for that at all. In an aggressive VACUUM (where we might actually have to wait for a cleanup lock), why should we wait once the age is over vacuum_freeze_min_age (usually 50 million XIDs)? The official answer is "because we need to advance relfrozenxid". But why not accept a much older relfrozenxid that is still sufficiently young/safe, in order to avoid waiting for a cleanup lock? In other words, what if our approach of "being diligent about advancing relfrozenxid" makes the relfrozenxid problem worse, not better? The problem with "being diligent" is that it is defined by FreezeLimit (which is more or less the same thing as vacuum_freeze_min_age), which is supposed to be about which tuples we will freeze. That's a very different thing to how old relfrozenxid should be or can be (after an aggressive VACUUM finishes). > > On the other hand, the risk may be far greater if we have *many* > > tuples that are still unfrozen, whose XIDs are only "middle aged" > > right now. The idea behind vacuum_freeze_min_age seems to be to be > > lazy about work (tuple freezing) in the hope that we'll never need to > > do it, but that seems obsolete now. (It probably made a little more > > sense before the visibility map.) > > Why is it obsolete now? I guess that it's still valid depending on the > cases, for example, heavily updated tables. Because after the 9.6 freezemap work we'll often set the all-visible bit in the VM, but not the all-frozen bit (unless we have the opportunistic freezing patch applied, which specifically avoids that). When that happens, affected heap pages will still have older-than-vacuum_freeze_min_age-XIDs after VACUUM runs, until we get to an aggressive VACUUM. There could be many VACUUMs before the aggressive VACUUM. This "freezing cliff" seems like it might be a big problem, in general. That's what I'm trying to address here. Either way, the system doesn't really respect vacuum_freeze_min_age in the way that it did before 9.6 -- which is what I meant by "obsolete". > I don’t have an objection to increasing autovacuum_freeze_max_age for > now. One of my concerns with anti-wraparound vacuums is that too many > tables (or several large tables) will reach autovacuum_freeze_max_age > at once, using up autovacuum slots and preventing autovacuums from > being launched on tables that are heavily being updated. I think that the patch helps with that, actually -- there tends to be "natural variation" in the relfrozenxid age of each table, which comes from per-table workload characteristics. > Given these > works, expanding the gap between vacuum_freeze_table_age and > autovacuum_freeze_max_age would have better chances for the tables to > advance its relfrozenxid by an aggressive vacuum instead of an > anti-wraparound-aggressive vacuum. 400 million seems to be a good > start. The idea behind getting rid of vacuum_freeze_table_age (not to be confused by the other idea about getting rid of vacuum_freeze_min_age) is this: with the patch series, we only tend to get an anti-wraparound VACUUM in extreme and relatively rare cases. For example, we will get aggressive anti-wraparound VACUUMs on tables that *never* grow, but constantly get HOT updates (e.g. the pgbench_accounts table with heap fill factor reduced to 90). We won't really be able to use the VM when this happens, either. With tables like this -- tables that still get aggressive VACUUMs -- maybe the patch doesn't make a huge difference. But that's truly the extreme case -- that is true only because there is already zero chance of there being a non-aggressive VACUUM. We'll get aggressive anti-wraparound VACUUMs every time we reach autovacuum_freeze_max_age, again and again -- no change, really. But since it's only these extreme cases that continue to get aggressive VACUUMs, why do we still need vacuum_freeze_table_age? It helps right now (without the patch) by "escalating" a regular VACUUM to an aggressive one. But the cases that we still expect an aggressive VACUUM (with the patch) are the cases where there is zero chance of that happening. Almost by definition. > Given the opportunistic freezing, that's true but I'm concerned > whether opportunistic freezing always works well on all tables since > freezing tuples is not 0 cost. That is the big question for this patch. -- Peter Geoghegan
pgsql-hackers by date: