Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Date
Msg-id 20240415173913.4zyyrwaftujxthf2@awork3.anarazel.de
Whole thread Raw
In response to Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()  (Noah Misch <noah@leadboat.com>)
Responses Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
List pgsql-bugs
Hi,

I've tried a couple times to catch up with this thread. But always kinda felt
I must be missing something. It might be that this is one part of the
confusion:

On 2024-01-06 12:24:13 -0800, Noah Misch wrote:
> Fair enough.  While I agree there's a decent chance back-patching would be
> okay, I think there's also a decent chance that 1ccc1e05ae creates the problem
> Matthias theorized.  Something like: we update relfrozenxid based on
> OldestXmin, even though GlobalVisState caused us to retain a tuple older than
> OldestXmin.  Then relfrozenxid disagrees with table contents.

Looking at the state as of 1ccc1e05ae, I don't see how - in lazy_scan_prune(),
if heap_page_prune() spuriously didn't prune a tuple, because the horizon went
backwards, we'd encounter the tuple in the loop below and call
heap_prepare_freeze_tuple(), which would error out with one of

    /*
     * Process xmin, while keeping track of whether it's already frozen, or
     * will become frozen iff our freeze plan is executed by caller (could be
     * neither).
     */
    xid = HeapTupleHeaderGetXmin(tuple);
    if (!TransactionIdIsNormal(xid))
        xmin_already_frozen = true;
    else
    {
        if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
            ereport(ERROR,
                    (errcode(ERRCODE_DATA_CORRUPTED),
                     errmsg_internal("found xmin %u from before relfrozenxid %u",
                                     xid, cutoffs->relfrozenxid)));

or
        if (TransactionIdPrecedes(update_xact, cutoffs->relfrozenxid))
            ereport(ERROR,
                    (errcode(ERRCODE_DATA_CORRUPTED),
                     errmsg_internal("multixact %u contains update XID %u from before relfrozenxid %u",
                                     multi, update_xact,
                                     cutoffs->relfrozenxid)));
or
        /* Raw xmax is normal XID */
        if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
            ereport(ERROR,
                    (errcode(ERRCODE_DATA_CORRUPTED),
                     errmsg_internal("found xmax %u from before relfrozenxid %u",
                                     xid, cutoffs->relfrozenxid)));


I'm not saying that spuriously erroring out would be ok. But I guess I just
don't understand the data corruption theory in this subthread, because we'd
error out if we encountered a tuple that should have been frozen but wasn't?

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Robert Haas
Date:
Subject: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Next
From: Andres Freund
Date:
Subject: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae