From f6489bbdfd8af4bcab9076300291a2182abbb6aa Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Sun, 12 Jun 2022 15:46:08 -0700 Subject: [PATCH v12 1/4] Add page-level freezing to VACUUM. Teach VACUUM to decide on whether or not to trigger freezing at the level of whole heap pages, not individual tuple fields. OldestXmin is now treated as the cutoff for freezing eligibility in all cases, while FreezeLimit is used to trigger freezing at the level of each page (we now freeze all eligible XIDs on a page when freezing is triggered for the page). Making the choice to freeze work at the page level tends to result in VACUUM writing less WAL in the long term. This is especially likely to work out due to complementary effects with the freeze plan WAL deduplication optimization added by commit 9e540599. Also teach VACUUM to trigger page-level freezing whenever it detects that heap pruning generated an FPI as torn page protection. We'll have already written a large amount of WAL just to do that much, so it's very likely a good idea to get freezing out of the way for the page early. This only happens in cases where it will directly lead to marking the page all-frozen in the visibility map. In most cases "freezing a page" removes all XIDs < OldestXmin, and all MXIDs < OldestMxact. It doesn't quite work that way in certain rare cases involving MultiXacts, though. It is convenient to define "freeze the page" in a way that gives FreezeMultiXactId the leeway to put off the work of processing an individual tuple's xmax whenever it happens to be a MultiXactId that would require an expensive second pass to process aggressively (allocating a new Multi is especially worth avoiding here). FreezeMultiXactId effectively makes a decision on how to proceed with processing at the level of each individual xmax field. Its no-op multi processing "freezes" an xmax in the event of an expensive-to-process xmax on a page when (for whatever reason) page-level freezing triggers. If, on the other hand, freezing is not triggered for the page, then page-level no-op processing takes care of the multi for us instead. Either way, the remaining Multi will ratchet back VACUUM's relfrozenxid and/or relminmxid trackers as required, and we won't need an expensive second pass over the multi (unless we really have no choice, for example during a VACUUM FREEZE, where FreezeLimit always matches OldestXmin). Later work will add eager freezing strategy to VACUUM (and reframe the behavior established by this commit as lazy freezing, even though it's not quite as lazy as the historical tuple-based approach to freezing). Making freezing work at the page level is not just an optimization; it's also a useful basis for modelling costs at the whole table level, since it makes the visibility map a more reliable indicator of how far behind (or ahead) we are on freezing at the level of the whole table. Later work that adds eager and lazy scanning strategies will build on that, ultimately allowing VACUUM to advance relfrozenxid far more frequently. Author: Peter Geoghegan Reviewed-By: Jeff Davis Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com --- src/include/access/heapam.h | 92 +++++- src/backend/access/heap/heapam.c | 455 ++++++++++++++------------- src/backend/access/heap/vacuumlazy.c | 169 ++++++---- doc/src/sgml/config.sgml | 11 +- 4 files changed, 444 insertions(+), 283 deletions(-) diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index 53eb01176..83b52e2a7 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -113,6 +113,83 @@ typedef struct HeapTupleFreeze OffsetNumber offset; } HeapTupleFreeze; +/* + * State used by VACUUM to track the details of freezing all eligible tuples + * on a given heap page. + * + * VACUUM prepares freeze plans for each page via heap_prepare_freeze_tuple + * calls (every tuple with storage gets its own call). This page-level freeze + * state is updated across each call, which ultimately determines whether or + * not freezing the page is required. (VACUUM freezes the page via a call to + * heap_freeze_execute_prepared, which freezes using prepared freeze plans.) + * + * Aside from the basic question of whether or not freezing will go ahead, the + * state also tracks the oldest extant XID/MXID in the table as a whole, for + * the purposes of advancing relfrozenxid/relminmxid values in pg_class later + * on. Each heap_prepare_freeze_tuple call pushes NewRelfrozenXid and/or + * NewRelminMxid back as required to avoid unsafe final pg_class values. Any + * and all unfrozen XIDs or MXIDs that remain after VACUUM finishes _must_ + * have values >= the final relfrozenxid/relminmxid values in pg_class. This + * includes XIDs that remain as MultiXact members from any tuple's xmax. + * + * When 'freeze_required' flag isn't set after all tuples are examined, the + * final choice on freezing is made by vacuumlazy.c. It can decide to trigger + * freezing based on whatever criteria it deems appropriate. However, it is + * recommended that vacuumlazy.c avoid early freezing of a page when it cannot + * then be marked all-frozen in the visibility map. + */ +typedef struct HeapPageFreeze +{ + /* Is heap_prepare_freeze_tuple caller required to freeze page? */ + bool freeze_required; + + /* + * "Freeze" NewRelfrozenXid/NewRelminMxid trackers. + * + * Trackers used when heap_freeze_execute_prepared freezes the page, and + * when page is "nominally frozen", which happens with pages where every + * call to heap_prepare_freeze_tuple produced no usable freeze plan. + * + * "Nominal freezing" enables vacuumlazy.c's approach of setting a page + * all-frozen in the visibility map when every tuple's 'totally_frozen' + * result is true. That always works in the same way, independent of the + * need to freeze tuples, and without complicating the general rule around + * 'totally_frozen' results (which is that 'totally_frozen' results are + * only to be trusted with a page that goes on to be frozen by caller). + * + * When we freeze a page, we generally freeze all XIDs < OldestXmin, only + * leaving behind XIDs that are ineligible for freezing, if any. And so + * you might wonder why these trackers are necessary at all; why should + * _any_ page that VACUUM freezes _ever_ be left with XIDs/MXIDs that + * ratchet back the top-level NewRelfrozenXid/NewRelminMxid trackers? + * + * It is useful to use a definition of "freeze the page" that does not + * overspecify how MultiXacts are affected. heap_prepare_freeze_tuple + * generally prefers to remove Multis eagerly, but lazy processing is used + * in cases where laziness allows VACUUM to avoid allocating a new Multi. + * The "freeze the page" trackers enable this flexibility. + */ + TransactionId FreezePageRelfrozenXid; + MultiXactId FreezePageRelminMxid; + + /* + * "No freeze" NewRelfrozenXid/NewRelminMxid trackers. + * + * These trackers are maintained in the same way as the trackers used when + * VACUUM scans a page that isn't cleanup locked. Both code paths are + * based on the same general idea (do less work for this page during the + * ongoing VACUUM, at the cost of having to accept older final values). + * + * When vacuumlazy.c caller decides to do "no freeze" processing, it must + * not go on to set the page all-frozen (setting the page all-visible + * could still be okay). heap_prepare_freeze_tuple's 'totally_frozen' + * results can only be trusted on a page that is frozen afterwards. + */ + TransactionId NoFreezePageRelfrozenXid; + MultiXactId NoFreezePageRelminMxid; + +} HeapPageFreeze; + /* ---------------- * function prototypes for heap access method * @@ -180,19 +257,18 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple, extern void heap_inplace_update(Relation relation, HeapTuple tuple); extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, const struct VacuumCutoffs *cutoffs, - HeapTupleFreeze *frz, bool *totally_frozen, - TransactionId *relfrozenxid_out, - MultiXactId *relminmxid_out); + HeapPageFreeze *pagefrz, + HeapTupleFreeze *frz, bool *totally_frozen); extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer, - TransactionId FreezeLimit, + TransactionId snapshotConflictHorizon, HeapTupleFreeze *tuples, int ntuples); extern bool heap_freeze_tuple(HeapTupleHeader tuple, TransactionId relfrozenxid, TransactionId relminmxid, TransactionId FreezeLimit, TransactionId MultiXactCutoff); -extern bool heap_tuple_would_freeze(HeapTupleHeader tuple, - const struct VacuumCutoffs *cutoffs, - TransactionId *relfrozenxid_out, - MultiXactId *relminmxid_out); +extern bool heap_tuple_should_freeze(HeapTupleHeader tuple, + const struct VacuumCutoffs *cutoffs, + TransactionId *NoFreezePageRelfrozenXid, + MultiXactId *NoFreezePageRelminMxid); extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple); extern void simple_heap_insert(Relation relation, HeapTuple tup); diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 86a88de85..71dfe5933 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6098,9 +6098,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple) * MultiXactId. * * "flags" is an output value; it's used to tell caller what to do on return. - * - * "mxid_oldest_xid_out" is an output value; it's used to track the oldest - * extant Xid within any Multixact that will remain after freezing executes. + * "pagefrz" is an input/output value, used to manage page level freezing. * * Possible values that we can set in "flags": * FRM_NOOP @@ -6115,15 +6113,32 @@ heap_inplace_update(Relation relation, HeapTuple tuple) * The return value is a new MultiXactId to set as new Xmax. * (caller must obtain proper infomask bits using GetMultiXactIdHintBits) * - * "mxid_oldest_xid_out" is only set when "flags" contains either FRM_NOOP or - * FRM_RETURN_IS_MULTI, since we only leave behind a MultiXactId for these. + * Caller delegates control of page freezing to us. In practice we always + * force freezing of caller's page unless FRM_NOOP processing is indicated. + * We help caller ensure that XIDs < FreezeLimit and MXIDs < MultiXactCutoff + * can never be left behind. We freely choose when and how to process each + * Multi, without ever violating the cutoff postconditions for freezing. * - * NB: Creates a _new_ MultiXactId when FRM_RETURN_IS_MULTI is set in "flags". + * It's useful to remove Multis on a proactive timeline (relative to freezing + * XIDs) to keep MultiXact member SLRU buffer misses to a minimum. It can also + * be cheaper in the short run, for us, since we too can avoid SLRU buffer + * misses through eager processing. + * + * NB: Creates a _new_ MultiXactId when FRM_RETURN_IS_MULTI is set, though only + * when FreezeLimit and/or MultiXactCutoff cutoffs leave us with no choice. + * This can usually be put off, which is usually enough to avoid it altogether. + * + * NB: Caller must maintain "no freeze" NewRelfrozenXid/NewRelminMxid trackers + * using heap_tuple_should_freeze when we haven't forced page-level freezing. + * + * NB: Caller should avoid needlessly calling heap_tuple_should_freeze when we + * have already forced page-level freezing, since that might incur the same + * SLRU buffer misses that we specifically intended to avoid by freezing. */ static TransactionId FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, const struct VacuumCutoffs *cutoffs, uint16 *flags, - TransactionId *mxid_oldest_xid_out) + HeapPageFreeze *pagefrz) { TransactionId newxmax = InvalidTransactionId; MultiXactMember *members; @@ -6134,7 +6149,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, bool has_lockers; TransactionId update_xid; bool update_committed; - TransactionId temp_xid_out; + TransactionId FreezePageRelfrozenXid; *flags = 0; @@ -6144,8 +6159,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, if (!MultiXactIdIsValid(multi) || HEAP_LOCKED_UPGRADED(t_infomask)) { - /* Ensure infomask bits are appropriately set/reset */ *flags |= FRM_INVALIDATE_XMAX; + pagefrz->freeze_required = true; return InvalidTransactionId; } else if (MultiXactIdPrecedes(multi, cutoffs->relminmxid)) @@ -6153,7 +6168,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, (errcode(ERRCODE_DATA_CORRUPTED), errmsg_internal("found multixact %u from before relminmxid %u", multi, cutoffs->relminmxid))); - else if (MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff)) + else if (MultiXactIdPrecedes(multi, cutoffs->OldestMxact)) { /* * This old multi cannot possibly have members still running, but @@ -6166,50 +6181,45 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg_internal("multixact %u from before cutoff %u found to be still running", - multi, cutoffs->MultiXactCutoff))); + multi, cutoffs->OldestMxact))); if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)) { *flags |= FRM_INVALIDATE_XMAX; + pagefrz->freeze_required = true; + return InvalidTransactionId; + } + + /* replace multi with single XID for its updater */ + newxmax = MultiXactIdGetUpdateXid(multi, t_infomask); + + if (TransactionIdPrecedes(newxmax, cutoffs->relfrozenxid)) + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg_internal("multixact %u contains update xid %u from before relfrozenxid %u", + multi, newxmax, cutoffs->relfrozenxid))); + else if (TransactionIdPrecedes(newxmax, cutoffs->OldestXmin)) + { + /* + * Updater XID has to have aborted (otherwise the tuple would have + * been pruned away instead, since updater XID is < OldestXmin). + * Just remove xmax. + */ + if (TransactionIdDidCommit(newxmax)) + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg_internal("multixact %u contains uncommitted update xid %u", + multi, newxmax))); + *flags |= FRM_INVALIDATE_XMAX; newxmax = InvalidTransactionId; } else { - /* replace multi with single XID for its updater */ - newxmax = MultiXactIdGetUpdateXid(multi, t_infomask); - - /* wasn't only a lock, xid needs to be valid */ - Assert(TransactionIdIsValid(newxmax)); - - if (TransactionIdPrecedes(newxmax, cutoffs->relfrozenxid)) - ereport(ERROR, - (errcode(ERRCODE_DATA_CORRUPTED), - errmsg_internal("found update xid %u from before relfrozenxid %u", - newxmax, cutoffs->relfrozenxid))); - - /* - * If the new xmax xid is older than OldestXmin, it has to have - * aborted, otherwise the tuple would have been pruned away - */ - if (TransactionIdPrecedes(newxmax, cutoffs->OldestXmin)) - { - if (TransactionIdDidCommit(newxmax)) - ereport(ERROR, - (errcode(ERRCODE_DATA_CORRUPTED), - errmsg_internal("cannot freeze committed update xid %u", newxmax))); - *flags |= FRM_INVALIDATE_XMAX; - newxmax = InvalidTransactionId; - } - else - { - *flags |= FRM_RETURN_IS_XID; - } + /* Have to keep updater XID as new xmax */ + *flags |= FRM_RETURN_IS_XID; } - /* - * Don't push back mxid_oldest_xid_out using FRM_RETURN_IS_XID Xid, or - * when no Xids will remain - */ + pagefrz->freeze_required = true; return newxmax; } @@ -6225,11 +6235,30 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, { /* Nothing worth keeping */ *flags |= FRM_INVALIDATE_XMAX; + pagefrz->freeze_required = true; return InvalidTransactionId; } + /* + * The FRM_NOOP case is the only case where we might need to ratchet back + * FreezePageRelfrozenXid or FreezePageRelminMxid. It is also the only + * case where our caller might ratchet back its NoFreezePageRelfrozenXid + * or NoFreezePageRelminMxid "no freeze" trackers to deal with a multi. + * FRM_NOOP handling should result in the NewRelfrozenXid/NewRelminMxid + * trackers managed by VACUUM being ratcheting back by xmax to the degree + * required to make it safe to leave xmax undisturbed, independent of + * whether or not page freezing is triggered somewhere else. + * + * Our policy is to force freezing in every case other than FRM_NOOP, + * which obviates the need to maintain either set of trackers, anywhere. + * Every other case will reliably execute a freeze plan for xmax that + * either replaces xmax with an XID/MXID >= OldestXmin/OldestMxact, or + * sets xmax to an InvalidTransactionId XID, rendering xmax fully frozen. + * (VACUUM's NewRelfrozenXid/NewRelminMxid trackers are initialized with + * OldestXmin/OldestMxact, so later values never need to be tracked here.) + */ need_replace = false; - temp_xid_out = *mxid_oldest_xid_out; /* init for FRM_NOOP */ + FreezePageRelfrozenXid = pagefrz->FreezePageRelfrozenXid; for (int i = 0; i < nmembers; i++) { TransactionId xid = members[i].xid; @@ -6238,26 +6267,29 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit)) { + /* Can't violate the FreezeLimit postcondition */ need_replace = true; break; } - if (TransactionIdPrecedes(members[i].xid, temp_xid_out)) - temp_xid_out = members[i].xid; + if (TransactionIdPrecedes(xid, FreezePageRelfrozenXid)) + FreezePageRelfrozenXid = xid; } - /* - * In the simplest case, there is no member older than FreezeLimit; we can - * keep the existing MultiXactId as-is, avoiding a more expensive second - * pass over the multi - */ + /* Can't violate the MultiXactCutoff postcondition, either */ + if (!need_replace) + need_replace = MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff); + if (!need_replace) { /* - * When mxid_oldest_xid_out gets pushed back here it's likely that the - * update Xid was the oldest member, but we don't rely on that + * vacuumlazy.c might ratchet back NewRelminMxid, NewRelfrozenXid, or + * both together to make it safe to retain this particular multi after + * freezing its page */ *flags |= FRM_NOOP; - *mxid_oldest_xid_out = temp_xid_out; + pagefrz->FreezePageRelfrozenXid = FreezePageRelfrozenXid; + if (MultiXactIdPrecedes(multi, pagefrz->FreezePageRelminMxid)) + pagefrz->FreezePageRelminMxid = multi; pfree(members); return multi; } @@ -6266,13 +6298,16 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * Do a more thorough second pass over the multi to figure out which * member XIDs actually need to be kept. Checking the precise status of * individual members might even show that we don't need to keep anything. + * + * We only reach this far when replacing xmax is absolutely mandatory. + * heap_tuple_should_freeze will indicate that the tuple should be frozen. + * We definitely won't leave behind an XID/MXID < OldestXmin/OldestMxact. */ nnewmembers = 0; newmembers = palloc(sizeof(MultiXactMember) * nmembers); has_lockers = false; update_xid = InvalidTransactionId; update_committed = false; - temp_xid_out = *mxid_oldest_xid_out; /* init for FRM_RETURN_IS_MULTI */ /* * Determine whether to keep each member xid, or to ignore it instead @@ -6293,14 +6328,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, if (TransactionIdIsCurrentTransactionId(xid) || TransactionIdIsInProgress(xid)) { + if (TransactionIdPrecedes(xid, cutoffs->OldestXmin)) + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg_internal("multixact %u contains locker xid %u from before removable cutoff %u", + multi, xid, cutoffs->OldestXmin))); newmembers[nnewmembers++] = members[i]; has_lockers = true; - - /* - * Cannot possibly be older than VACUUM's OldestXmin, so we - * don't need a NewRelfrozenXid step here - */ - Assert(TransactionIdPrecedesOrEquals(cutoffs->OldestXmin, xid)); } continue; @@ -6317,8 +6351,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, if (TransactionIdPrecedes(xid, cutoffs->OldestXmin)) ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), - errmsg_internal("found update xid %u from before removable cutoff %u", - xid, cutoffs->OldestXmin))); + errmsg_internal("multixact %u contains update xid %u from before removable cutoff %u", + multi, xid, cutoffs->OldestXmin))); if (TransactionIdIsValid(update_xid)) ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), @@ -6328,8 +6362,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, update_xid, xid))); /* - * If the transaction is known aborted or crashed then it's okay to - * ignore it, otherwise not. + * If the updater transaction is known aborted or crashed then it's + * okay to ignore it, otherwise not. * * As with all tuple visibility routines, it's critical to test * TransactionIdIsInProgress before TransactionIdDidCommit, because of @@ -6358,13 +6392,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, } /* - * We determined that this is an Xid corresponding to an update that - * must be retained -- add it to new members list for later. Also - * consider pushing back mxid_oldest_xid_out. + * We determined that updater has an Xid >= OldestXmin, which must be + * retained -- add it to pending new members list */ newmembers[nnewmembers++] = members[i]; - if (TransactionIdPrecedes(xid, temp_xid_out)) - temp_xid_out = xid; } pfree(members); @@ -6375,10 +6406,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, */ if (nnewmembers == 0) { - /* nothing worth keeping!? Tell caller to remove the whole thing */ + /* Keeping nothing (neither an Xid nor a MultiXactId) in xmax */ *flags |= FRM_INVALIDATE_XMAX; newxmax = InvalidTransactionId; - /* Don't push back mxid_oldest_xid_out -- no Xids will remain */ } else if (TransactionIdIsValid(update_xid) && !has_lockers) { @@ -6394,22 +6424,20 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, if (update_committed) *flags |= FRM_MARK_COMMITTED; newxmax = update_xid; - /* Don't push back mxid_oldest_xid_out using FRM_RETURN_IS_XID Xid */ } else { /* * Create a new multixact with the surviving members of the previous - * one, to set as new Xmax in the tuple. The oldest surviving member - * might push back mxid_oldest_xid_out. + * one (all of which are >= OldestXmin) to set as new Xmax */ newxmax = MultiXactIdCreateFromMembers(nnewmembers, newmembers); *flags |= FRM_RETURN_IS_MULTI; - *mxid_oldest_xid_out = temp_xid_out; } pfree(newmembers); + pagefrz->freeze_required = true; return newxmax; } @@ -6417,9 +6445,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * heap_prepare_freeze_tuple * * Check to see whether any of the XID fields of a tuple (xmin, xmax, xvac) - * are older than the FreezeLimit and/or MultiXactCutoff freeze cutoffs. If so, - * setup enough state (in the *frz output argument) to later execute and - * WAL-log what caller needs to do for the tuple, and return true. Return + * are older than the OldestXmin and/or OldestMxact freeze cutoffs. If so, + * setup enough state (in the *frz output argument) to enable caller to + * process this tuple as part of freezing its page, and return true. Return * false if nothing can be changed about the tuple right now. * * Also sets *totally_frozen to true if the tuple will be totally frozen once @@ -6427,22 +6455,30 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * frozen by an earlier VACUUM). This indicates that there are no remaining * XIDs or MultiXactIds that will need to be processed by a future VACUUM. * - * VACUUM caller must assemble HeapTupleFreeze entries for every tuple that we - * returned true for when called. A later heap_freeze_execute_prepared call - * will execute freezing for caller's page as a whole. + * VACUUM caller must assemble HeapTupleFreeze freeze plan entries for every + * tuple that we returned true for, and call heap_freeze_execute_prepared to + * execute freezing. Caller must initialize pagefrz fields for page as a + * whole before first call here for each heap page. + * + * VACUUM caller decides on whether or not to freeze the page as a whole. + * We'll often prepare freeze plans for a page that caller just discards. + * However, VACUUM doesn't always get to make a choice; it must freeze when + * pagefrz.freeze_required is set, to ensure that any XIDs < FreezeLimit (and + * MXIDs < MultiXactCutoff) can never be left behind. We help to make sure + * that VACUUM always follows that rule. + * + * We sometimes force freezing of xmax MultiXactId values long before it is + * strictly necessary to do so just to ensure the FreezeLimit postcondition. + * It's worth processing MultiXactIds proactively when it is cheap to do so, + * and it's convenient to make that happen by piggy-backing it on the "force + * freezing" mechanism. Conversely, we sometimes delay freezing MultiXactIds + * because it is expensive right now (though only when it's still possible to + * do so without violating the FreezeLimit/MultiXactCutoff postcondition). * * It is assumed that the caller has checked the tuple with * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD * (else we should be removing the tuple, not freezing it). * - * The *relfrozenxid_out and *relminmxid_out arguments are the current target - * relfrozenxid and relminmxid for VACUUM caller's heap rel. Any and all - * unfrozen XIDs or MXIDs that remain in caller's rel after VACUUM finishes - * _must_ have values >= the final relfrozenxid/relminmxid values in pg_class. - * This includes XIDs that remain as MultiXact members from any tuple's xmax. - * Each call here pushes back *relfrozenxid_out and/or *relminmxid_out as - * needed to avoid unsafe final values in rel's authoritative pg_class tuple. - * * NB: This function has side effects: it might allocate a new MultiXactId. * It will be set as tuple's new xmax when our *frz output is processed within * heap_execute_freeze_tuple later on. If the tuple is in a shared buffer @@ -6451,9 +6487,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, const struct VacuumCutoffs *cutoffs, - HeapTupleFreeze *frz, bool *totally_frozen, - TransactionId *relfrozenxid_out, - MultiXactId *relminmxid_out) + HeapPageFreeze *pagefrz, + HeapTupleFreeze *frz, bool *totally_frozen) { bool xmin_already_frozen = false, xmax_already_frozen = false; @@ -6470,7 +6505,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, /* * Process xmin, while keeping track of whether it's already frozen, or - * will become frozen when our freeze plan is executed by caller (could be + * will become frozen iff our freeze plan is executed by caller (could be * neither). */ xid = HeapTupleHeaderGetXmin(tuple); @@ -6484,21 +6519,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, errmsg_internal("found xmin %u from before relfrozenxid %u", xid, cutoffs->relfrozenxid))); - freeze_xmin = TransactionIdPrecedes(xid, cutoffs->FreezeLimit); - if (freeze_xmin) - { - if (!TransactionIdDidCommit(xid)) - ereport(ERROR, - (errcode(ERRCODE_DATA_CORRUPTED), - errmsg_internal("uncommitted xmin %u from before xid cutoff %u needs to be frozen", - xid, cutoffs->FreezeLimit))); - } - else - { - /* xmin to remain unfrozen. Could push back relfrozenxid_out. */ - if (TransactionIdPrecedes(xid, *relfrozenxid_out)) - *relfrozenxid_out = xid; - } + freeze_xmin = TransactionIdPrecedes(xid, cutoffs->OldestXmin); + if (freeze_xmin && !TransactionIdDidCommit(xid)) + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg_internal("uncommitted xmin %u from before xid cutoff %u needs to be frozen", + xid, cutoffs->OldestXmin))); + + /* Will set freeze_xmin flags in freeze plan below */ } /* @@ -6515,41 +6543,59 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, * For Xvac, we always freeze proactively. This allows totally_frozen * tracking to ignore xvac. */ - replace_xvac = true; + replace_xvac = pagefrz->freeze_required = true; + + /* Will set replace_xvac flags in freeze plan below */ } - /* - * Process xmax. To thoroughly examine the current Xmax value we need to - * resolve a MultiXactId to its member Xids, in case some of them are - * below the given FreezeLimit. In that case, those values might need - * freezing, too. Also, if a multi needs freezing, we cannot simply take - * it out --- if there's a live updater Xid, it needs to be kept. - * - * Make sure to keep heap_tuple_would_freeze in sync with this. - */ + /* Now process xmax */ xid = HeapTupleHeaderGetRawXmax(tuple); - if (tuple->t_infomask & HEAP_XMAX_IS_MULTI) { /* Raw xmax is a MultiXactId */ TransactionId newxmax; uint16 flags; - TransactionId mxid_oldest_xid_out = *relfrozenxid_out; + /* + * We will either remove xmax completely (in the "freeze_xmax" path), + * process xmax by replacing it (in the "replace_xmax" path), or + * perform no-op xmax processing. The only constraint is that the + * FreezeLimit/MultiXactCutoff postcondition must never be violated. + */ newxmax = FreezeMultiXactId(xid, tuple->t_infomask, cutoffs, - &flags, &mxid_oldest_xid_out); + &flags, pagefrz); - if (flags & FRM_RETURN_IS_XID) + if (flags & FRM_NOOP) + { + /* + * xmax is a MultiXactId, and nothing about it changes for now. + * This is the only case where 'freeze_required' won't have been + * set for us by FreezeMultiXactId, as well as the only case where + * neither freeze_xmax nor replace_xmax are set (given a multi). + * + * This is a no-op, but the call to FreezeMultiXactId might have + * ratcheted back NewRelfrozenXid and/or NewRelminMxid trackers + * for us (the "freeze page" variants, specifically). That'll + * make it safe for our caller to freeze the page later on, while + * leaving this particular xmax undisturbed. + * + * FreezeMultiXactId is _not_ responsible for the "no freeze" + * NewRelfrozenXid/NewRelminMxid trackers, though -- that's our + * job. A call to heap_tuple_should_freeze for this same tuple + * will take place below if 'freeze_required' isn't set already. + * (This repeats work from FreezeMultiXactId, but allows "no + * freeze" tracker maintenance to happen in only one place.) + */ + Assert(MultiXactIdIsValid(newxmax) && xid == newxmax); + Assert(!MultiXactIdPrecedes(newxmax, pagefrz->FreezePageRelminMxid)); + } + else if (flags & FRM_RETURN_IS_XID) { /* * xmax will become an updater Xid (original MultiXact's updater * member Xid will be carried forward as a simple Xid in Xmax). - * Might have to ratchet back relfrozenxid_out here, though never - * relminmxid_out. */ Assert(!TransactionIdPrecedes(newxmax, cutoffs->OldestXmin)); - if (TransactionIdPrecedes(newxmax, *relfrozenxid_out)) - *relfrozenxid_out = newxmax; /* * NB -- some of these transformations are only valid because we @@ -6572,13 +6618,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, /* * xmax is an old MultiXactId that we have to replace with a new * MultiXactId, to carry forward two or more original member XIDs. - * Might have to ratchet back relfrozenxid_out here, though never - * relminmxid_out. */ Assert(!MultiXactIdPrecedes(newxmax, cutoffs->OldestMxact)); - Assert(TransactionIdPrecedesOrEquals(mxid_oldest_xid_out, - *relfrozenxid_out)); - *relfrozenxid_out = mxid_oldest_xid_out; /* * We can't use GetMultiXactIdHintBits directly on the new multi @@ -6594,20 +6635,6 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, frz->xmax = newxmax; replace_xmax = true; } - else if (flags & FRM_NOOP) - { - /* - * xmax is a MultiXactId, and nothing about it changes for now. - * Might have to ratchet back relminmxid_out, relfrozenxid_out, or - * both together. - */ - Assert(MultiXactIdIsValid(newxmax) && xid == newxmax); - Assert(TransactionIdPrecedesOrEquals(mxid_oldest_xid_out, - *relfrozenxid_out)); - if (MultiXactIdPrecedes(xid, *relminmxid_out)) - *relminmxid_out = xid; - *relfrozenxid_out = mxid_oldest_xid_out; - } else { /* @@ -6618,9 +6645,12 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, Assert(MultiXactIdPrecedes(xid, cutoffs->OldestMxact)); Assert(!TransactionIdIsValid(newxmax)); - /* Will set t_infomask/t_infomask2 flags in freeze plan below */ + /* Will set freeze_xmax flags in freeze plan below */ freeze_xmax = true; } + + /* Only FRM_NOOP doesn't force caller to freeze page */ + Assert(pagefrz->freeze_required || (!freeze_xmax && !replace_xmax)); } else if (TransactionIdIsNormal(xid)) { @@ -6631,28 +6661,21 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, errmsg_internal("found xmax %u from before relfrozenxid %u", xid, cutoffs->relfrozenxid))); - if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit)) - { - /* - * If we freeze xmax, make absolutely sure that it's not an XID - * that is important. (Note, a lock-only xmax can be removed - * independent of committedness, since a committed lock holder has - * released the lock). - */ - if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) && - TransactionIdDidCommit(xid)) - ereport(ERROR, - (errcode(ERRCODE_DATA_CORRUPTED), - errmsg_internal("cannot freeze committed xmax %u", - xid))); + if (TransactionIdPrecedes(xid, cutoffs->OldestXmin)) freeze_xmax = true; - /* No need for relfrozenxid_out handling, since we'll freeze xmax */ - } - else - { - if (TransactionIdPrecedes(xid, *relfrozenxid_out)) - *relfrozenxid_out = xid; - } + + /* + * If we freeze xmax, make absolutely sure that it's not an XID that + * is important. (Note, a lock-only xmax can be removed independent + * of committedness, since a committed lock holder has released the + * lock). + */ + if (freeze_xmax && !HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) && + TransactionIdDidCommit(xid)) + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg_internal("cannot freeze committed xmax %u", + xid))); } else if (!TransactionIdIsValid(xid)) { @@ -6679,6 +6702,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, * failed; whereas a non-dead MOVED_IN tuple must mean the xvac * transaction succeeded. */ + Assert(pagefrz->freeze_required); if (tuple->t_infomask & HEAP_MOVED_OFF) frz->frzflags |= XLH_INVALID_XVAC; else @@ -6687,8 +6711,9 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, if (replace_xmax) { Assert(!xmax_already_frozen && !freeze_xmax); + Assert(pagefrz->freeze_required); - /* Already set t_infomask/t_infomask2 flags in freeze plan */ + /* Already set replace_xmax flags in freeze plan earlier */ } if (freeze_xmax) { @@ -6709,13 +6734,23 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, /* * Determine if this tuple is already totally frozen, or will become - * totally frozen + * totally frozen (provided caller executes freeze plan for the page) */ *totally_frozen = ((freeze_xmin || xmin_already_frozen) && (freeze_xmax || xmax_already_frozen)); - /* A "totally_frozen" tuple must not leave anything behind in xmax */ - Assert(!*totally_frozen || !replace_xmax); + if (!pagefrz->freeze_required && !(xmin_already_frozen && + xmax_already_frozen)) + { + /* + * So far no previous tuple from the page made freezing mandatory. + * Does this tuple force caller to freeze the entire page? + */ + pagefrz->freeze_required = + heap_tuple_should_freeze(tuple, cutoffs, + &pagefrz->NoFreezePageRelfrozenXid, + &pagefrz->NoFreezePageRelminMxid); + } /* Tell caller if this tuple has a usable freeze plan set in *frz */ return freeze_xmin || replace_xvac || replace_xmax || freeze_xmax; @@ -6761,13 +6796,12 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz) */ void heap_freeze_execute_prepared(Relation rel, Buffer buffer, - TransactionId FreezeLimit, + TransactionId snapshotConflictHorizon, HeapTupleFreeze *tuples, int ntuples) { Page page = BufferGetPage(buffer); Assert(ntuples > 0); - Assert(TransactionIdIsNormal(FreezeLimit)); START_CRIT_SECTION(); @@ -6790,19 +6824,10 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer, int nplans; xl_heap_freeze_page xlrec; XLogRecPtr recptr; - TransactionId snapshotConflictHorizon; /* Prepare deduplicated representation for use in WAL record */ nplans = heap_xlog_freeze_plan(tuples, ntuples, plans, offsets); - /* - * FreezeLimit is (approximately) the first XID not frozen by VACUUM. - * Back up caller's FreezeLimit to avoid false conflicts when - * FreezeLimit is precisely equal to VACUUM's OldestXmin cutoff. - */ - snapshotConflictHorizon = FreezeLimit; - TransactionIdRetreat(snapshotConflictHorizon); - xlrec.snapshotConflictHorizon = snapshotConflictHorizon; xlrec.nplans = nplans; @@ -6843,8 +6868,7 @@ heap_freeze_tuple(HeapTupleHeader tuple, bool do_freeze; bool totally_frozen; struct VacuumCutoffs cutoffs; - TransactionId NewRelfrozenXid = FreezeLimit; - MultiXactId NewRelminMxid = MultiXactCutoff; + HeapPageFreeze pagefrz; cutoffs.relfrozenxid = relfrozenxid; cutoffs.relminmxid = relminmxid; @@ -6853,9 +6877,14 @@ heap_freeze_tuple(HeapTupleHeader tuple, cutoffs.FreezeLimit = FreezeLimit; cutoffs.MultiXactCutoff = MultiXactCutoff; + pagefrz.freeze_required = true; + pagefrz.FreezePageRelfrozenXid = FreezeLimit; + pagefrz.FreezePageRelminMxid = MultiXactCutoff; + pagefrz.NoFreezePageRelfrozenXid = FreezeLimit; + pagefrz.NoFreezePageRelminMxid = MultiXactCutoff; + do_freeze = heap_prepare_freeze_tuple(tuple, &cutoffs, - &frz, &totally_frozen, - &NewRelfrozenXid, &NewRelminMxid); + &pagefrz, &frz, &totally_frozen); /* * Note that because this is not a WAL-logged operation, we don't need to @@ -7278,22 +7307,24 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple) } /* - * heap_tuple_would_freeze + * heap_tuple_should_freeze * * Return value indicates if heap_prepare_freeze_tuple sibling function would - * freeze any of the XID/MXID fields from the tuple, given the same cutoffs. - * We must also deal with dead tuples here, since (xmin, xmax, xvac) fields - * could be processed by pruning away the whole tuple instead of freezing. + * (or should) force freezing of the heap page that contains caller's tuple. + * Tuple header XIDs/MXIDs < FreezeLimit/MultiXactCutoff trigger freezing. + * This includes (xmin, xmax, xvac) fields, as well as MultiXact member XIDs. * - * The *relfrozenxid_out and *relminmxid_out input/output arguments work just - * like the heap_prepare_freeze_tuple arguments that they're based on. We - * never freeze here, which makes tracking the oldest extant XID/MXID simple. + * The *NoFreezePageRelfrozenXid and *NoFreezePageRelminMxid input/output + * arguments help VACUUM track the oldest extant XID/MXID remaining in rel. + * Our working assumption is that caller won't decide to freeze this tuple. + * It's up to caller to only ratchet back its own top-level trackers after the + * point that it fully commits to not freezing the tuple/page in question. */ bool -heap_tuple_would_freeze(HeapTupleHeader tuple, - const struct VacuumCutoffs *cutoffs, - TransactionId *relfrozenxid_out, - MultiXactId *relminmxid_out) +heap_tuple_should_freeze(HeapTupleHeader tuple, + const struct VacuumCutoffs *cutoffs, + TransactionId *NoFreezePageRelfrozenXid, + MultiXactId *NoFreezePageRelminMxid) { TransactionId xid; MultiXactId multi; @@ -7304,8 +7335,8 @@ heap_tuple_would_freeze(HeapTupleHeader tuple, if (TransactionIdIsNormal(xid)) { Assert(TransactionIdPrecedesOrEquals(cutoffs->relfrozenxid, xid)); - if (TransactionIdPrecedes(xid, *relfrozenxid_out)) - *relfrozenxid_out = xid; + if (TransactionIdPrecedes(xid, *NoFreezePageRelfrozenXid)) + *NoFreezePageRelfrozenXid = xid; if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit)) freeze = true; } @@ -7322,8 +7353,8 @@ heap_tuple_would_freeze(HeapTupleHeader tuple, { Assert(TransactionIdPrecedesOrEquals(cutoffs->relfrozenxid, xid)); /* xmax is a non-permanent XID */ - if (TransactionIdPrecedes(xid, *relfrozenxid_out)) - *relfrozenxid_out = xid; + if (TransactionIdPrecedes(xid, *NoFreezePageRelfrozenXid)) + *NoFreezePageRelfrozenXid = xid; if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit)) freeze = true; } @@ -7334,8 +7365,8 @@ heap_tuple_would_freeze(HeapTupleHeader tuple, else if (HEAP_LOCKED_UPGRADED(tuple->t_infomask)) { /* xmax is a pg_upgrade'd MultiXact, which can't have updater XID */ - if (MultiXactIdPrecedes(multi, *relminmxid_out)) - *relminmxid_out = multi; + if (MultiXactIdPrecedes(multi, *NoFreezePageRelminMxid)) + *NoFreezePageRelminMxid = multi; /* heap_prepare_freeze_tuple always freezes pg_upgrade'd xmax */ freeze = true; } @@ -7346,8 +7377,8 @@ heap_tuple_would_freeze(HeapTupleHeader tuple, int nmembers; Assert(MultiXactIdPrecedesOrEquals(cutoffs->relminmxid, multi)); - if (MultiXactIdPrecedes(multi, *relminmxid_out)) - *relminmxid_out = multi; + if (MultiXactIdPrecedes(multi, *NoFreezePageRelminMxid)) + *NoFreezePageRelminMxid = multi; if (MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff)) freeze = true; @@ -7359,8 +7390,8 @@ heap_tuple_would_freeze(HeapTupleHeader tuple, { xid = members[i].xid; Assert(TransactionIdPrecedesOrEquals(cutoffs->relfrozenxid, xid)); - if (TransactionIdPrecedes(xid, *relfrozenxid_out)) - *relfrozenxid_out = xid; + if (TransactionIdPrecedes(xid, *NoFreezePageRelfrozenXid)) + *NoFreezePageRelfrozenXid = xid; if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit)) freeze = true; } @@ -7374,9 +7405,9 @@ heap_tuple_would_freeze(HeapTupleHeader tuple, if (TransactionIdIsNormal(xid)) { Assert(TransactionIdPrecedesOrEquals(cutoffs->relfrozenxid, xid)); - if (TransactionIdPrecedes(xid, *relfrozenxid_out)) - *relfrozenxid_out = xid; - /* heap_prepare_freeze_tuple always freezes xvac */ + if (TransactionIdPrecedes(xid, *NoFreezePageRelfrozenXid)) + *NoFreezePageRelfrozenXid = xid; + /* heap_prepare_freeze_tuple forces xvac freezing */ freeze = true; } } diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 98ccb9882..18192fed5 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1525,8 +1525,8 @@ lazy_scan_prune(LVRelState *vacrel, live_tuples, recently_dead_tuples; int nnewlpdead; - TransactionId NewRelfrozenXid; - MultiXactId NewRelminMxid; + HeapPageFreeze pagefrz; + int64 fpi_before = pgWalUsage.wal_fpi; OffsetNumber deadoffsets[MaxHeapTuplesPerPage]; HeapTupleFreeze frozen[MaxHeapTuplesPerPage]; @@ -1542,8 +1542,11 @@ lazy_scan_prune(LVRelState *vacrel, retry: /* Initialize (or reset) page-level state */ - NewRelfrozenXid = vacrel->NewRelfrozenXid; - NewRelminMxid = vacrel->NewRelminMxid; + pagefrz.freeze_required = false; + pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid; + pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid; + pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid; + pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid; tuples_deleted = 0; tuples_frozen = 0; lpdead_items = 0; @@ -1596,27 +1599,23 @@ retry: continue; } - /* - * LP_DEAD items are processed outside of the loop. - * - * Note that we deliberately don't set hastup=true in the case of an - * LP_DEAD item here, which is not how count_nondeletable_pages() does - * it -- it only considers pages empty/truncatable when they have no - * items at all (except LP_UNUSED items). - * - * Our assumption is that any LP_DEAD items we encounter here will - * become LP_UNUSED inside lazy_vacuum_heap_page() before we actually - * call count_nondeletable_pages(). In any case our opinion of - * whether or not a page 'hastup' (which is how our caller sets its - * vacrel->nonempty_pages value) is inherently race-prone. It must be - * treated as advisory/unreliable, so we might as well be slightly - * optimistic. - */ if (ItemIdIsDead(itemid)) { + /* + * Delay unsetting all_visible until after we have decided on + * whether this page should be frozen. We need to test "is this + * page all_visible, assuming any LP_DEAD items are set LP_UNUSED + * in final heap pass?" to reach a decision. all_visible will be + * unset before we return, as required by lazy_scan_heap caller. + * + * Deliberately don't set hastup for LP_DEAD items. We make the + * soft assumption that any LP_DEAD items encountered here will + * become LP_UNUSED later on, before count_nondeletable_pages is + * reached. Whether the page 'hastup' is inherently race-prone. + * It must be treated as unreliable by caller anyway, so we might + * as well be slightly optimistic about it. + */ deadoffsets[lpdead_items++] = offnum; - prunestate->all_visible = false; - prunestate->has_lpdead_items = true; continue; } @@ -1743,9 +1742,8 @@ retry: prunestate->hastup = true; /* page makes rel truncation unsafe */ /* Tuple with storage -- consider need to freeze */ - if (heap_prepare_freeze_tuple(tuple.t_data, &vacrel->cutoffs, - &frozen[tuples_frozen], &totally_frozen, - &NewRelfrozenXid, &NewRelminMxid)) + if (heap_prepare_freeze_tuple(tuple.t_data, &vacrel->cutoffs, &pagefrz, + &frozen[tuples_frozen], &totally_frozen)) { /* Save prepared freeze plan for later */ frozen[tuples_frozen++].offset = offnum; @@ -1759,40 +1757,98 @@ retry: prunestate->all_frozen = false; } - vacrel->offnum = InvalidOffsetNumber; - /* * We have now divided every item on the page into either an LP_DEAD item * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple * that remains and needs to be considered for freezing now (LP_UNUSED and * LP_REDIRECT items also remain, but are of no further interest to us). */ - vacrel->NewRelfrozenXid = NewRelfrozenXid; - vacrel->NewRelminMxid = NewRelminMxid; + vacrel->offnum = InvalidOffsetNumber; /* - * Consider the need to freeze any items with tuple storage from the page - * first (arbitrary) + * Freeze the page when heap_prepare_freeze_tuple indicates that at least + * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also + * freeze when pruning generated an FPI, if doing so means that we set the + * page all-frozen afterwards (might not happen until second heap pass). */ - if (tuples_frozen > 0) + if (pagefrz.freeze_required || tuples_frozen == 0 || + (prunestate->all_visible && prunestate->all_frozen && + fpi_before != pgWalUsage.wal_fpi)) { - Assert(prunestate->hastup); + /* + * We're freezing the page. Our final NewRelfrozenXid doesn't need to + * be affected by the XIDs that are just about to be frozen anyway. + */ + vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid; + vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid; - vacrel->frozen_pages++; + if (tuples_frozen == 0) + { + /* + * We're freezing all eligible tuples on the page, but have no + * freeze plans to execute. This is structured as a case where + * the page is nominally frozen so that we reliably ratchet back + * the NewRelfrozenXid/NewRelminMxid trackers as instructed by + * heap_prepare_freeze_tuple. Note that we may still set the page + * all-frozen in the visibility map (unlike the "no freeze" case). + * + * We end up here when pruning removed a deleted tuple which just + * so happened to leave only totally frozen tuples on the page. + * It's also possible that there are remaining unfrozen XIDs/MXIDs + * that are ineligible for freezing, which precludes setting the + * page all-frozen, but doesn't necessarily preclude setting the + * page all-visible (sometimes a single lock-only MultiXactId will + * have made it unsafe to set an all-visible page all-frozen). + * + * We deliberately don't touch the frozen_pages instrumentation + * counter here, since it counts pages with newly frozen tuples + * (don't confuse that with pages newly set all-frozen in VM). + */ + } + else + { + TransactionId snapshotConflictHorizon; - /* Execute all freeze plans for page as a single atomic action */ - heap_freeze_execute_prepared(vacrel->rel, buf, - vacrel->cutoffs.FreezeLimit, - frozen, tuples_frozen); + Assert(prunestate->hastup); + + vacrel->frozen_pages++; + + /* + * We can use the latest xmin cutoff (which is generally used for + * 'VM set' conflicts) as our cutoff for freeze conflicts when the + * whole page is eligible to become all-frozen in the VM once + * frozen by us. Otherwise use a conservative cutoff (just back + * up from OldestXmin). + */ + if (prunestate->all_visible && prunestate->all_frozen) + snapshotConflictHorizon = prunestate->visibility_cutoff_xid; + else + { + snapshotConflictHorizon = vacrel->cutoffs.OldestXmin; + TransactionIdRetreat(snapshotConflictHorizon); + } + + /* Execute all freeze plans for page as a single atomic action */ + heap_freeze_execute_prepared(vacrel->rel, buf, + snapshotConflictHorizon, + frozen, tuples_frozen); + } + } + else + { + /* + * Page requires "no freeze" processing. It might be possible to set + * the page all-visible, but it'll never become all-frozen in the VM. + * + * NewRelfrozenXid will be <= XIDs from remaining unpruned tuples. + */ + vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid; + vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid; + tuples_frozen = 0; + prunestate->all_frozen = false; } /* - * The second pass over the heap can also set visibility map bits, using - * the same approach. This is important when the table frequently has a - * few old LP_DEAD items on each page by the time we get to it (typically - * because past opportunistic pruning operations freed some non-HOT - * tuples). - * * VACUUM will call heap_page_is_all_visible() during the second pass over * the heap to determine all_visible and all_frozen for the page -- this * is a specialized version of the logic from this function. Now that @@ -1801,7 +1857,7 @@ retry: */ #ifdef USE_ASSERT_CHECKING /* Note that all_frozen value does not matter when !all_visible */ - if (prunestate->all_visible) + if (prunestate->all_visible && lpdead_items == 0) { TransactionId cutoff; bool all_frozen; @@ -1809,9 +1865,6 @@ retry: if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen)) Assert(false); - Assert(lpdead_items == 0); - Assert(prunestate->all_frozen == all_frozen); - /* * It's possible that we froze tuples and made the page's XID cutoff * (for recovery conflict purposes) FrozenTransactionId. This is okay @@ -1831,9 +1884,6 @@ retry: VacDeadItems *dead_items = vacrel->dead_items; ItemPointerData tmp; - Assert(!prunestate->all_visible); - Assert(prunestate->has_lpdead_items); - vacrel->lpdead_item_pages++; ItemPointerSetBlockNumber(&tmp, blkno); @@ -1847,6 +1897,10 @@ retry: Assert(dead_items->num_items <= dead_items->max_items); pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES, dead_items->num_items); + + /* Our caller expects LP_DEAD item to unset all_visible */ + prunestate->all_visible = false; + prunestate->has_lpdead_items = true; } /* Finally, add page-local counts to whole-VACUUM counts */ @@ -1891,8 +1945,8 @@ lazy_scan_noprune(LVRelState *vacrel, recently_dead_tuples, missed_dead_tuples; HeapTupleHeader tupleheader; - TransactionId NewRelfrozenXid = vacrel->NewRelfrozenXid; - MultiXactId NewRelminMxid = vacrel->NewRelminMxid; + TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid; + MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid; OffsetNumber deadoffsets[MaxHeapTuplesPerPage]; Assert(BufferGetBlockNumber(buf) == blkno); @@ -1937,8 +1991,9 @@ lazy_scan_noprune(LVRelState *vacrel, *hastup = true; /* page prevents rel truncation */ tupleheader = (HeapTupleHeader) PageGetItem(page, itemid); - if (heap_tuple_would_freeze(tupleheader, &vacrel->cutoffs, - &NewRelfrozenXid, &NewRelminMxid)) + if (heap_tuple_should_freeze(tupleheader, &vacrel->cutoffs, + &NoFreezePageRelfrozenXid, + &NoFreezePageRelminMxid)) { /* Tuple with XID < FreezeLimit (or MXID < MultiXactCutoff) */ if (vacrel->aggressive) @@ -2019,8 +2074,8 @@ lazy_scan_noprune(LVRelState *vacrel, * this particular page until the next VACUUM. Remember its details now. * (lazy_scan_prune expects a clean slate, so we have to do this last.) */ - vacrel->NewRelfrozenXid = NewRelfrozenXid; - vacrel->NewRelminMxid = NewRelminMxid; + vacrel->NewRelfrozenXid = NoFreezePageRelfrozenXid; + vacrel->NewRelminMxid = NoFreezePageRelminMxid; /* Save any LP_DEAD items found on the page in dead_items array */ if (vacrel->nindexes == 0) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 9eedab652..44e15b5fb 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -9194,9 +9194,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; - Specifies the cutoff age (in transactions) that VACUUM - should use to decide whether to freeze row versions - while scanning a table. + Specifies the cutoff age (in transactions) that + VACUUM should use to decide whether to + trigger freezing of pages that have an older XID. The default is 50 million transactions. Although users can set this value anywhere from zero to one billion, VACUUM will silently limit the effective value to half @@ -9274,9 +9274,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; Specifies the cutoff age (in multixacts) that VACUUM - should use to decide whether to replace multixact IDs with a newer - transaction ID or multixact ID while scanning a table. The default - is 5 million multixacts. + should use to decide whether to trigger freezing of pages with + an older multixact ID. The default is 5 million multixacts. Although users can set this value anywhere from zero to one billion, VACUUM will silently limit the effective value to half the value of , -- 2.38.1