Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae - Mailing list pgsql-bugs
From | Andres Freund |
---|---|
Subject | Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae |
Date | |
Msg-id | 20240415185233.65x57tbdybj4kqbg@awork3.anarazel.de Whole thread Raw |
In response to | Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
|
List | pgsql-bugs |
Hi, On 2024-03-29 12:05:31 -0400, Robert Haas wrote: > But what exactly is the fix here? I now understand why there was > discussion of preventing maybe_needed from going backward, but I'm not > sure that's quite the right idea. I think the problem isn't exactly > that the maybe_needed value that is updated by GlobalVisUpdateApply() > can go backwards, but rather that the SOMETHING_oldest_removable value > on which it depends can go backwards. AFAICS, the problem isn't so > much that the GlobalVisTest itself can retreat to an older value, but > that it can retreat compared to the OldestXmin obtained from calling > GetOldestNonRemovableTransactionId(), which depends on the > SOMETHING_oldest_removable threshold but *not* on maybe_needed. I agree that in theory SOMETHING_oldest_removable should never go backwards. But in practice it's decidedly nontrivial to prevent that. There's a reason we've had a version of this comment: * Note: despite the above, it's possible for the calculated values to move * backwards on repeated calls. for a long time. We really ought to ensure that it doesn't go backwards even across sessions, because one session vacuuming a relation with one horizon, and then a subsequent session vacuuming the same relation with an earlier horizon seems like a recipe for bugs. But because the horizon depends on the database of the affected object, we can't just stash a value for each of the possible horizons in shared memory. We'd need a database-oid keyed hash-table or such. OTOH, for hot_standby_feedback without a slot, we kinda *do* want the horizons to go backwards, so that a brief failure of the replication connection is less likely to cause ongoing queries to fail over the next while. > Perhaps, for some reason I'm not grokking at the moment, preventing > maybe_needed from retreating would be good enough to prevent trouble > in practice, but it doesn't appear to me to be the most principled > solution as of this writing. If we prevent maybe_needed from retreating below the OldestXmin value used for cutoff, then pruning should never be less aggressive than what's needed by lazy_scan_prune(). So lazy_scan_prune() shouldn't be able to encounter tuples that weren't considered removable in heap_page_prune(), in < 17, nor would heap_page_prune_and_freeze() have that issue. I think one fairly easy fix for this would be to pass OldestXmin to heap_page_prune[_and_freeze](), and have heap_prune_satisfies_vacuum() first check OldestXmin and only if that considers the tuple still running, use GlobalVisTestIsRemovableXid(). That continues to allow us to prune more aggressively, without any potential risk of IsRemoval being too conservative. > I kind of feel like the code is just going at this in a way that's > completely backwards. It feels like what we should be doing is looking > at the actual states of the pages post-pruning, and specifically what > the oldest (M)XIDs that are really present there are. Isn't that pretty much what the code is doing these days? We do additionally enforce a specific boundary in some cases, to be able to advance relfrozenxid in some cases, but we advance it more aggressively if there aren't any xids preventing that (if we scan all the necessary pages etc). Greetings, Andres Freund
pgsql-bugs by date: