Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae - Mailing list pgsql-bugs

From Melanie Plageman
Subject Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Date
Msg-id CAAKRu_Z50WSPWLYg-2NC4TDBSyTLMRL_jG=K+txByTAeu5nNXA@mail.gmail.com
Whole thread Raw
In response to Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-bugs
On Thu, Jun 20, 2024 at 11:49 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Tue, Jun 18, 2024 at 6:51 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> >
> > Finally, upthread there is discussion of how we could end up doing a
> > catalog lookup after vacuum_get_cutoffs() and before the tuple
> > visibility check on 16. Assuming this is true, we would want to
> > backport the fix to 16 as well. I could use some help getting a repro
> > (using btree index deletion for example) of the infinite loop on 16.
>
> So, I ended up working on a new repro that works by forcing a round of
> index vacuuming after the standby reconnects and before pruning a dead
> tuple whose xmax is older than OldestXmin.
>
> At the end of the round of index vacuuming, _bt_pendingfsm_finalize()
> calls GetOldestNonRemovableTransactionId(), thereby updating the
> backend's GlobalVisState and moving maybe_needed backwards.
>
> Then vacuum's first pass will continue with pruning and find our later
> inserted and updated tuple HEAPTUPLE_RECENTLY_DEAD when compared to
> maybe_needed but HEAPTUPLE_DEAD when compared to OldestXmin.
>
> I make sure that the standby reconnects between vacuum_get_cutoffs()
> (vacuum_set_xid_limits() on 14/15) and pruning because I have a cursor
> on the page keeping VACUUM FREEZE from getting a cleanup lock.
>
> See the repros for step-by-step explanations of how it works.
>
> With this, I can repro the infinite loop on 14-16.
>
> Backporting 1ccc1e05ae fixes 16 but, with the new repro, 14 and 15
> error out with "cannot freeze committed xmax". I'm going to
> investigate further why this is happening. It definitely makes me
> wonder about the fix.

It turns out it was also erroring out on 16 (i.e. backporting
1ccc1e05ae did not fix anything), but I didn't notice it because the
perl TAP test passed. I also discovered we can hit this error in
master, so I started a thread about that here [1].

- Melanie

[1] https://www.postgresql.org/message-id/CAAKRu_bDD7oq9ZwB2OJqub5BovMG6UjEYsoK2LVttadjEqyRGg%40mail.gmail.com



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #18517: Dropping a table referenced by an initially deferred foreign key fails with an error
Next
From: Michael Paquier
Date:
Subject: Re: BUG #18499: Reindexing spgist index concurrently triggers Assert("TransactionIdIsValid(state->myXid)")