Home > mailing lists

Re: Improving the "Routine Vacuuming" docs - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Improving the "Routine Vacuuming" docs
Date	April 13, 2022 16:34:22
Msg-id	CAH2-Wzk_fRWVcgN0KkwSKSJDKsz0po=s4e__dac=PhHNP+jaUg@mail.gmail.com Whole thread Raw
In response to	Re: Improving the "Routine Vacuuming" docs (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Improving the "Routine Vacuuming" docs
List	pgsql-hackers

Tree view

On Wed, Apr 13, 2022 at 8:40 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > Something along the lines of the following seems more useful: "A tuple
> > whose xmin is frozen (and xmax is unset) is considered visible to
> > every possible MVCC snapshot. In other words, the transaction that
> > inserted the tuple is treated as if it ran and committed at some point
> > that is now *infinitely* far in the past."
>
> I agree with this idea.

Cool. Maybe I should write a doc patch just for this part, then.

What do you think of the idea of relating freezing to removing tuples
by VACUUM at this point? This would be a basis for explaining how
freezing and tuple removal are constrained by the same cutoff. A very
old snapshot can hold up cleanup, but it can also hold up freezing to
the same degree (it's just not as obvious because we are less eager
about freezing by default).

> > The alarming language isn't proportionate to the true danger
> > (something I complained about in a dedicated thread last year [1]).
>
> I mostly agree with this, but not entirely. The section needs some
> rephrasing, but xidStopLimit doesn't apply in single-user mode, and
> relfrozenxid and datfrozenxid values can and do get corrupted. So it's
> not a purely academic concern.

I accept the distinction that you want to make is valid. More on that below.

> > * XID space isn't really a precious resource -- it isn't even a
> > resource at all IMV.
>
> I disagree with this. Usable XID space is definitely a resource, and
> if you're in the situation where you care deeply about this section of
> the documentation, it's probably one in short supply. Being careful
> not to expend too many XIDs while fixing the problems that have cause
> you to be short of safe XIDs is *definitely* a real thing.

I may have gone too far with this metaphor. My point was mostly that
XID space has a highly unpredictable cost (paid in freezing).

Perhaps we can agree on some (or even all) of the following specific points:

* We shouldn't mention "4 billion XIDs" at all.

* We should say that the issue is an issue of distances between
unfrozen XIDs. The maximum distance that can ever be allowed to emerge
between any two unfrozen XIDs in a cluster is about 2 billion XIDs.

* We don't need to say anything about how XIDs are compared, normal vs
permanent XIDs, etc.

* The system takes drastic intervention to prevent this implementation
restriction from becoming a problem, starting with anti-wraparound
autovacuums. Then there's the failsafe. Finally, there's the
xidStopLimit mechanism, our last line of defense.

> I think it is wrong to conflate wraparound with xidStopLimit.
> xidStopLimit is the final defense against an actual wraparound, and
> like I say, an actual wraparound is quite possible if you put the
> system in single user mode and then do something like this:

I forget to emphasize one aspect of the problem that seems quite
important: the document itself seems to conflate the xidStopLimit
mechanism with true wraparound. At least I thought so. Last year's
thread on this subject ('What is "wraparound failure", really?') was
mostly about that confusion. I personally found that very confusing,
and I doubt that I'm the only one.

There is no good reason to use single user mode anymore (a related
problem with the docs is that we still haven't made that point). And
the pg_upgrade bug that led to invalid relfrozenxid values was
flagrantly just a bug (adding a WARNING for this recently, in commit
e83ebfe6). So while I accept that the distinction you're making here
is valid, maybe we can fix the single user mode doc bug too, removing
the need to discuss "true wraparound" as a general phenomenon. You
shouldn't ever see it in practice anymore. If you do then either
you've done something that "invalidated the warranty", or you've run
into a legitimate bug.

-- 
Peter Geoghegan

pgsql-hackers by date:

From: Dave Cramer
Date: 13 April 2022, 16:33:01
Subject: timezones BCE

From: chap@anastigmatix.net
Date: 13 April 2022, 16:48:10
Subject: Re: timezones BCE

Re: Improving the "Routine Vacuuming" docs - Mailing list pgsql-hackers

Previous

Next