Re: Improving the "Routine Vacuuming" docs - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Improving the "Routine Vacuuming" docs |
Date | |
Msg-id | CAH2-Wzk_fRWVcgN0KkwSKSJDKsz0po=s4e__dac=PhHNP+jaUg@mail.gmail.com Whole thread Raw |
In response to | Re: Improving the "Routine Vacuuming" docs (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Improving the "Routine Vacuuming" docs
|
List | pgsql-hackers |
On Wed, Apr 13, 2022 at 8:40 AM Robert Haas <robertmhaas@gmail.com> wrote: > > Something along the lines of the following seems more useful: "A tuple > > whose xmin is frozen (and xmax is unset) is considered visible to > > every possible MVCC snapshot. In other words, the transaction that > > inserted the tuple is treated as if it ran and committed at some point > > that is now *infinitely* far in the past." > > I agree with this idea. Cool. Maybe I should write a doc patch just for this part, then. What do you think of the idea of relating freezing to removing tuples by VACUUM at this point? This would be a basis for explaining how freezing and tuple removal are constrained by the same cutoff. A very old snapshot can hold up cleanup, but it can also hold up freezing to the same degree (it's just not as obvious because we are less eager about freezing by default). > > The alarming language isn't proportionate to the true danger > > (something I complained about in a dedicated thread last year [1]). > > I mostly agree with this, but not entirely. The section needs some > rephrasing, but xidStopLimit doesn't apply in single-user mode, and > relfrozenxid and datfrozenxid values can and do get corrupted. So it's > not a purely academic concern. I accept the distinction that you want to make is valid. More on that below. > > * XID space isn't really a precious resource -- it isn't even a > > resource at all IMV. > > I disagree with this. Usable XID space is definitely a resource, and > if you're in the situation where you care deeply about this section of > the documentation, it's probably one in short supply. Being careful > not to expend too many XIDs while fixing the problems that have cause > you to be short of safe XIDs is *definitely* a real thing. I may have gone too far with this metaphor. My point was mostly that XID space has a highly unpredictable cost (paid in freezing). Perhaps we can agree on some (or even all) of the following specific points: * We shouldn't mention "4 billion XIDs" at all. * We should say that the issue is an issue of distances between unfrozen XIDs. The maximum distance that can ever be allowed to emerge between any two unfrozen XIDs in a cluster is about 2 billion XIDs. * We don't need to say anything about how XIDs are compared, normal vs permanent XIDs, etc. * The system takes drastic intervention to prevent this implementation restriction from becoming a problem, starting with anti-wraparound autovacuums. Then there's the failsafe. Finally, there's the xidStopLimit mechanism, our last line of defense. > I think it is wrong to conflate wraparound with xidStopLimit. > xidStopLimit is the final defense against an actual wraparound, and > like I say, an actual wraparound is quite possible if you put the > system in single user mode and then do something like this: I forget to emphasize one aspect of the problem that seems quite important: the document itself seems to conflate the xidStopLimit mechanism with true wraparound. At least I thought so. Last year's thread on this subject ('What is "wraparound failure", really?') was mostly about that confusion. I personally found that very confusing, and I doubt that I'm the only one. There is no good reason to use single user mode anymore (a related problem with the docs is that we still haven't made that point). And the pg_upgrade bug that led to invalid relfrozenxid values was flagrantly just a bug (adding a WARNING for this recently, in commit e83ebfe6). So while I accept that the distinction you're making here is valid, maybe we can fix the single user mode doc bug too, removing the need to discuss "true wraparound" as a general phenomenon. You shouldn't ever see it in practice anymore. If you do then either you've done something that "invalidated the warranty", or you've run into a legitimate bug. -- Peter Geoghegan
pgsql-hackers by date: