From 483bc8df203f9df058fcb53e7972e3912e223b30 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Mon, 22 Nov 2021 10:02:30 -0800 Subject: [PATCH v9 1/4] Loosen coupling between relfrozenxid and freezing. When VACUUM set relfrozenxid before now, it set it to whatever value was used to determine which tuples to freeze -- the FreezeLimit cutoff. This approach was very naive: the relfrozenxid invariant only requires that new relfrozenxid values be <= the oldest extant XID remaining in the table (at the point that the VACUUM operation ends), which in general might be much more recent than FreezeLimit. There is no fixed relationship between the amount of physical work performed by VACUUM to make it safe to advance relfrozenxid (freezing and pruning), and the actual number of XIDs that relfrozenxid can be advanced by (at least in principle) as a result. VACUUM might have to freeze all of the tuples from a hundred million heap pages just to enable relfrozenxid to be advanced by no more than one or two XIDs. On the other hand, VACUUM might end up doing little or no work, and yet still be capable of advancing relfrozenxid by hundreds of millions of XIDs as a result. VACUUM now sets relfrozenxid (and relminmxid) using the exact oldest extant XID (and oldest extant MultiXactId) from the table, including XIDs from the table's remaining/unfrozen MultiXacts. This requires that VACUUM carefully track the oldest unfrozen XID/MultiXactId as it goes. This optimization doesn't require any changes to the definition of relfrozenxid, nor does it require changes to the core design of freezing. Final relfrozenxid values must still be >= FreezeLimit in an aggressive VACUUM (FreezeLimit is still used as an XID-age based backstop there). In non-aggressive VACUUMs (where there is still no strict guarantee that relfrozenxid will be advanced at all), we now advance relfrozenxid by as much as we possibly can. This exploits workload conditions that make it easy to advance relfrozenxid by many more XIDs (for the same amount of freezing/pruning work). The non-aggressive case can now set relfrozenxid to any legal XID value, which could in principle be any XID that is > the existing relfrozenxid, and <= the VACUUM operation's OldestXmin/"removal cutoff" XID value. FreezeLimit is still used by VACUUM to determine which tuples to freeze, at least for now. Practical experience from the field may show that non-aggressive VACUUMs seldom need to set relfrozenxid to an XID from before FreezeLimit, but having the option still seems very valuable. A later commit will teach VACUUM to determine which tuples to freeze based on page-level characteristics. Without this improved approach to freezing in place, most individual tables still have very little chance of relfrozenxid advancement during non-aggressive VACUUMs (an aggressive anti-wraparound autovacuum will still eventually be required with most tables). All it takes is an earlier VACUUM that sets just a few pages all-visible (but not all-frozen); later non-aggressive VACUUMs will end up skipping those pages, as a matter of policy, making relfrozenxid advancement impossible. This can be avoided by avoiding setting pages all-visible (but not all-frozen) in the first place. Once VACUUM becomes capable of consistently advancing relfrozenxid, even during non-aggressive VACUUMs, relfrozenxid values (and especially relminmxid values) will tend to track what's really happening in each table much more accurately. This is expected to make anti-wraparound autovacuums far rarer in practice. The problem of "anti-wraparound stampedes" (where multiple anti-wraparound autovacuums are launched at exactly the same time) is also naturally avoided by advancing relfrozenxid early and often (since it results in "natural diversity" among relfrozenxid values, due to table-level workload characteristics). Credit for the general idea of using the oldest extant XID to set pg_class.relfrozenxid at the end of VACUUM goes to Andres Freund. Author: Peter Geoghegan Reviewed-By: Robert Haas Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com --- src/include/access/heapam.h | 7 +- src/include/access/heapam_xlog.h | 4 +- src/include/commands/vacuum.h | 1 + src/backend/access/heap/heapam.c | 194 ++++++++++++++++++++------- src/backend/access/heap/vacuumlazy.c | 128 +++++++++++++----- src/backend/commands/cluster.c | 5 +- src/backend/commands/vacuum.c | 42 +++--- 7 files changed, 280 insertions(+), 101 deletions(-) diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index b46ab7d73..10584a4ce 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -167,8 +167,11 @@ extern void heap_inplace_update(Relation relation, HeapTuple tuple); extern bool heap_freeze_tuple(HeapTupleHeader tuple, TransactionId relfrozenxid, TransactionId relminmxid, TransactionId cutoff_xid, TransactionId cutoff_multi); -extern bool heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid, - MultiXactId cutoff_multi); +extern bool heap_tuple_needs_freeze(HeapTupleHeader tuple, + TransactionId backstop_cutoff_xid, + MultiXactId backstop_cutoff_multi, + TransactionId *relfrozenxid_nofreeze_out, + MultiXactId *relminmxid_nofreeze_out); extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple); extern void simple_heap_insert(Relation relation, HeapTuple tup); diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h index 5c47fdcec..2d8a7f627 100644 --- a/src/include/access/heapam_xlog.h +++ b/src/include/access/heapam_xlog.h @@ -410,7 +410,9 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, TransactionId cutoff_multi, xl_heap_freeze_tuple *frz, - bool *totally_frozen); + bool *totally_frozen, + TransactionId *relfrozenxid_out, + MultiXactId *relminmxid_out); extern void heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *xlrec_tp); extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer, diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h index d64f6268f..ead88edda 100644 --- a/src/include/commands/vacuum.h +++ b/src/include/commands/vacuum.h @@ -291,6 +291,7 @@ extern bool vacuum_set_xid_limits(Relation rel, int multixact_freeze_min_age, int multixact_freeze_table_age, TransactionId *oldestXmin, + MultiXactId *oldestMxact, TransactionId *freezeLimit, MultiXactId *multiXactCutoff); extern bool vacuum_xid_failsafe_check(TransactionId relfrozenxid, diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 59d43e2ba..134bc408a 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6140,12 +6140,24 @@ heap_inplace_update(Relation relation, HeapTuple tuple) * FRM_RETURN_IS_MULTI * The return value is a new MultiXactId to set as new Xmax. * (caller must obtain proper infomask bits using GetMultiXactIdHintBits) + * + * "relfrozenxid_out" is an output value; it's used to maintain target new + * relfrozenxid for the relation. It can be ignored unless "flags" contains + * either FRM_NOOP or FRM_RETURN_IS_MULTI, because we only handle multiXacts + * here. This follows the general convention: only track XIDs that will still + * be in the table after the ongoing VACUUM finishes. Note that it's up to + * caller to maintain this when the Xid return value is itself an Xid. + * + * Note that we cannot depend on xmin to maintain relfrozenxid_out. We need + * to push maintenance of relfrozenxid_out down this far, since in general + * xmin might have been frozen by an earlier VACUUM operation, in which case + * our caller will not have factored-in xmin into relfrozenxid_out's value. */ static TransactionId FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, TransactionId relfrozenxid, TransactionId relminmxid, TransactionId cutoff_xid, MultiXactId cutoff_multi, - uint16 *flags) + uint16 *flags, TransactionId *relfrozenxid_out) { TransactionId xid = InvalidTransactionId; int i; @@ -6157,6 +6169,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, bool has_lockers; TransactionId update_xid; bool update_committed; + TransactionId temprelfrozenxid_out; *flags = 0; @@ -6251,13 +6264,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, /* is there anything older than the cutoff? */ need_replace = false; + temprelfrozenxid_out = *relfrozenxid_out; for (i = 0; i < nmembers; i++) { if (TransactionIdPrecedes(members[i].xid, cutoff_xid)) - { need_replace = true; - break; - } + if (TransactionIdPrecedes(members[i].xid, temprelfrozenxid_out)) + temprelfrozenxid_out = members[i].xid; } /* @@ -6266,6 +6279,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, */ if (!need_replace) { + *relfrozenxid_out = temprelfrozenxid_out; *flags |= FRM_NOOP; pfree(members); return InvalidTransactionId; @@ -6275,6 +6289,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * If the multi needs to be updated, figure out which members do we need * to keep. */ + temprelfrozenxid_out = *relfrozenxid_out; nnewmembers = 0; newmembers = palloc(sizeof(MultiXactMember) * nmembers); has_lockers = false; @@ -6356,7 +6371,11 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * list.) */ if (TransactionIdIsValid(update_xid)) + { newmembers[nnewmembers++] = members[i]; + if (TransactionIdPrecedes(members[i].xid, temprelfrozenxid_out)) + temprelfrozenxid_out = members[i].xid; + } } else { @@ -6366,6 +6385,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, { /* running locker cannot possibly be older than the cutoff */ Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid)); + Assert(!TransactionIdPrecedes(members[i].xid, *relfrozenxid_out)); newmembers[nnewmembers++] = members[i]; has_lockers = true; } @@ -6394,6 +6414,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, if (update_committed) *flags |= FRM_MARK_COMMITTED; xid = update_xid; + /* Caller manages relfrozenxid_out directly when we return an XID */ } else { @@ -6403,6 +6424,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, */ xid = MultiXactIdCreateFromMembers(nnewmembers, newmembers); *flags |= FRM_RETURN_IS_MULTI; + *relfrozenxid_out = temprelfrozenxid_out; } pfree(newmembers); @@ -6421,6 +6443,11 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * will be totally frozen after these operations are performed and false if * more freezing will eventually be required. * + * Maintains *relfrozenxid_out and *relminmxid_out, which are the current + * target relfrozenxid and relminmxid for the relation. Caller should make + * temp copies of global tracking variables before starting to process a page, + * so that we can only scribble on copies. + * * Caller is responsible for setting the offset field, if appropriate. * * It is assumed that the caller has checked the tuple with @@ -6445,7 +6472,10 @@ bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId relfrozenxid, TransactionId relminmxid, TransactionId cutoff_xid, TransactionId cutoff_multi, - xl_heap_freeze_tuple *frz, bool *totally_frozen_p) + xl_heap_freeze_tuple *frz, + bool *totally_frozen_p, + TransactionId *relfrozenxid_out, + MultiXactId *relminmxid_out) { bool changed = false; bool xmax_already_frozen = false; @@ -6489,6 +6519,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, frz->t_infomask |= HEAP_XMIN_FROZEN; changed = true; } + else if (TransactionIdPrecedes(xid, *relfrozenxid_out)) + { + /* won't be frozen, but older than current relfrozenxid_out */ + *relfrozenxid_out = xid; + } } /* @@ -6506,10 +6541,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, { TransactionId newxmax; uint16 flags; + TransactionId temp = *relfrozenxid_out; newxmax = FreezeMultiXactId(xid, tuple->t_infomask, relfrozenxid, relminmxid, - cutoff_xid, cutoff_multi, &flags); + cutoff_xid, cutoff_multi, &flags, &temp); freeze_xmax = (flags & FRM_INVALIDATE_XMAX); @@ -6527,6 +6563,24 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, if (flags & FRM_MARK_COMMITTED) frz->t_infomask |= HEAP_XMAX_COMMITTED; changed = true; + + if (TransactionIdPrecedes(newxmax, *relfrozenxid_out)) + { + /* New xmax is an XID older than new relfrozenxid_out */ + *relfrozenxid_out = newxmax; + } + } + else if (flags & FRM_NOOP) + { + /* + * Changing nothing, so might have to ratchet back relminmxid_out, + * relfrozenxid_out, or both together + */ + if (MultiXactIdIsValid(xid) && + MultiXactIdPrecedes(xid, *relminmxid_out)) + *relminmxid_out = xid; + if (TransactionIdPrecedes(temp, *relfrozenxid_out)) + *relfrozenxid_out = temp; } else if (flags & FRM_RETURN_IS_MULTI) { @@ -6548,6 +6602,13 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, frz->xmax = newxmax; changed = true; + + /* + * New multixact might have remaining XID older than + * relfrozenxid_out + */ + if (TransactionIdPrecedes(temp, *relfrozenxid_out)) + *relfrozenxid_out = temp; } } else if (TransactionIdIsNormal(xid)) @@ -6575,7 +6636,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, freeze_xmax = true; } else + { freeze_xmax = false; + if (TransactionIdPrecedes(xid, *relfrozenxid_out)) + { + /* won't be frozen, but older than current relfrozenxid_out */ + *relfrozenxid_out = xid; + } + } } else if ((tuple->t_infomask & HEAP_XMAX_INVALID) || !TransactionIdIsValid(HeapTupleHeaderGetRawXmax(tuple))) @@ -6622,6 +6690,9 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, * was removed in PostgreSQL 9.0. Note that if we were to respect * cutoff_xid here, we'd need to make surely to clear totally_frozen * when we skipped freezing on that basis. + * + * Since we always freeze here, relfrozenxid_out doesn't need to be + * maintained. */ if (TransactionIdIsNormal(xid)) { @@ -6699,11 +6770,14 @@ heap_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple frz; bool do_freeze; bool tuple_totally_frozen; + TransactionId relfrozenxid_out = cutoff_xid; + MultiXactId relminmxid_out = cutoff_multi; do_freeze = heap_prepare_freeze_tuple(tuple, relfrozenxid, relminmxid, cutoff_xid, cutoff_multi, - &frz, &tuple_totally_frozen); + &frz, &tuple_totally_frozen, + &relfrozenxid_out, &relminmxid_out); /* * Note that because this is not a WAL-logged operation, we don't need to @@ -7133,6 +7207,22 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple) * Check to see whether any of the XID fields of a tuple (xmin, xmax, xvac) * are older than the specified cutoff XID or MultiXactId. If so, return true. * + * See heap_prepare_freeze_tuple for information about the basic rules for the + * cutoffs used here. + * + * Maintains *relfrozenxid_nofreeze_out and *relminmxid_nofreeze_out, which + * are the current target relfrozenxid and relminmxid for the relation. We + * assume that caller will never want to freeze its tuple, even when the tuple + * "needs freezing" according to our return value. Caller should make temp + * copies of global tracking variables before starting to process a page, so + * that we can only scribble on copies. That way caller can just discard the + * temp copies if it isn't okay with that assumption. + * + * Only aggressive VACUUM callers are expected to really care when a tuple + * "needs freezing" according to us. It follows that non-aggressive VACUUMs + * can use *relfrozenxid_nofreeze_out and *relminmxid_nofreeze_out in all + * cases. + * * It doesn't matter whether the tuple is alive or dead, we are checking * to see if a tuple needs to be removed or frozen to avoid wraparound. * @@ -7140,15 +7230,23 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple) * on a standby. */ bool -heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid, - MultiXactId cutoff_multi) +heap_tuple_needs_freeze(HeapTupleHeader tuple, + TransactionId backstop_cutoff_xid, + MultiXactId backstop_cutoff_multi, + TransactionId *relfrozenxid_nofreeze_out, + MultiXactId *relminmxid_nofreeze_out) { TransactionId xid; + bool needs_freeze = false; xid = HeapTupleHeaderGetXmin(tuple); - if (TransactionIdIsNormal(xid) && - TransactionIdPrecedes(xid, cutoff_xid)) - return true; + if (TransactionIdIsNormal(xid)) + { + if (TransactionIdPrecedes(xid, *relfrozenxid_nofreeze_out)) + *relfrozenxid_nofreeze_out = xid; + if (TransactionIdPrecedes(xid, backstop_cutoff_xid)) + needs_freeze = true; + } /* * The considerations for multixacts are complicated; look at @@ -7158,57 +7256,59 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid, if (tuple->t_infomask & HEAP_XMAX_IS_MULTI) { MultiXactId multi; + MultiXactMember *members; + int nmembers; multi = HeapTupleHeaderGetRawXmax(tuple); - if (!MultiXactIdIsValid(multi)) - { - /* no xmax set, ignore */ - ; - } - else if (HEAP_LOCKED_UPGRADED(tuple->t_infomask)) + if (MultiXactIdIsValid(multi) && + MultiXactIdPrecedes(multi, *relminmxid_nofreeze_out)) + *relminmxid_nofreeze_out = multi; + + if (HEAP_LOCKED_UPGRADED(tuple->t_infomask)) return true; - else if (MultiXactIdPrecedes(multi, cutoff_multi)) - return true; - else + else if (MultiXactIdPrecedes(multi, backstop_cutoff_multi)) + needs_freeze = true; + + /* need to check whether any member of the mxact is too old */ + nmembers = GetMultiXactIdMembers(multi, &members, false, + HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)); + + for (int i = 0; i < nmembers; i++) { - MultiXactMember *members; - int nmembers; - int i; - - /* need to check whether any member of the mxact is too old */ - - nmembers = GetMultiXactIdMembers(multi, &members, false, - HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)); - - for (i = 0; i < nmembers; i++) - { - if (TransactionIdPrecedes(members[i].xid, cutoff_xid)) - { - pfree(members); - return true; - } - } - if (nmembers > 0) - pfree(members); + if (TransactionIdPrecedes(members[i].xid, backstop_cutoff_xid)) + needs_freeze = true; + if (TransactionIdPrecedes(members[i].xid, + *relfrozenxid_nofreeze_out)) + *relfrozenxid_nofreeze_out = xid; } + if (nmembers > 0) + pfree(members); } else { xid = HeapTupleHeaderGetRawXmax(tuple); - if (TransactionIdIsNormal(xid) && - TransactionIdPrecedes(xid, cutoff_xid)) - return true; + if (TransactionIdIsNormal(xid)) + { + if (TransactionIdPrecedes(xid, *relfrozenxid_nofreeze_out)) + *relfrozenxid_nofreeze_out = xid; + if (TransactionIdPrecedes(xid, backstop_cutoff_xid)) + needs_freeze = true; + } } if (tuple->t_infomask & HEAP_MOVED) { xid = HeapTupleHeaderGetXvac(tuple); - if (TransactionIdIsNormal(xid) && - TransactionIdPrecedes(xid, cutoff_xid)) - return true; + if (TransactionIdIsNormal(xid)) + { + if (TransactionIdPrecedes(xid, *relfrozenxid_nofreeze_out)) + *relfrozenxid_nofreeze_out = xid; + if (TransactionIdPrecedes(xid, backstop_cutoff_xid)) + needs_freeze = true; + } } - return false; + return needs_freeze; } /* diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 40101e0cb..6ebb9c520 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -144,7 +144,7 @@ typedef struct LVRelState Relation *indrels; int nindexes; - /* Aggressive VACUUM (scan all unfrozen pages)? */ + /* Aggressive VACUUM? (must set relfrozenxid >= FreezeLimit) */ bool aggressive; /* Use visibility map to skip? (disabled by DISABLE_PAGE_SKIPPING) */ bool skipwithvm; @@ -172,8 +172,9 @@ typedef struct LVRelState /* VACUUM operation's cutoff for freezing XIDs and MultiXactIds */ TransactionId FreezeLimit; MultiXactId MultiXactCutoff; - /* Are FreezeLimit/MultiXactCutoff still valid? */ - bool freeze_cutoffs_valid; + /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */ + TransactionId NewRelfrozenXid; + MultiXactId NewRelminMxid; /* Error reporting state */ char *relnamespace; @@ -329,6 +330,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, PgStat_Counter startreadtime = 0; PgStat_Counter startwritetime = 0; TransactionId OldestXmin; + MultiXactId OldestMxact; TransactionId FreezeLimit; MultiXactId MultiXactCutoff; @@ -355,17 +357,17 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, * used to determine which XIDs/MultiXactIds will be frozen. * * If this is an aggressive VACUUM, then we're strictly required to freeze - * any and all XIDs from before FreezeLimit, so that we will be able to - * safely advance relfrozenxid up to FreezeLimit below (we must be able to - * advance relminmxid up to MultiXactCutoff, too). + * any and all XIDs from before FreezeLimit in order to be able to advance + * relfrozenxid to a value >= FreezeLimit below. There is an analogous + * requirement around MultiXact freezing, relminmxid, and MultiXactCutoff. */ aggressive = vacuum_set_xid_limits(rel, params->freeze_min_age, params->freeze_table_age, params->multixact_freeze_min_age, params->multixact_freeze_table_age, - &OldestXmin, &FreezeLimit, - &MultiXactCutoff); + &OldestXmin, &OldestMxact, + &FreezeLimit, &MultiXactCutoff); skipwithvm = true; if (params->options & VACOPT_DISABLE_PAGE_SKIPPING) @@ -472,8 +474,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->OldestXmin = OldestXmin; vacrel->FreezeLimit = FreezeLimit; vacrel->MultiXactCutoff = MultiXactCutoff; - /* Track if cutoffs became invalid (possible in !aggressive case only) */ - vacrel->freeze_cutoffs_valid = true; + /* Initialize state used to track oldest extant XID/XMID */ + vacrel->NewRelfrozenXid = OldestXmin; + vacrel->NewRelminMxid = OldestMxact; /* * Call lazy_scan_heap to perform all required heap pruning, index @@ -526,16 +529,15 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, * Aggressive VACUUM must reliably advance relfrozenxid (and relminmxid). * We are able to advance relfrozenxid in a non-aggressive VACUUM too, * provided we didn't skip any all-visible (not all-frozen) pages using - * the visibility map, and assuming that we didn't fail to get a cleanup - * lock that made it unsafe with respect to FreezeLimit (or perhaps our - * MultiXactCutoff) established for VACUUM operation. + * the visibility map. A non-aggressive VACUUM might advance relfrozenxid + * to an XID that is either older or newer than FreezeLimit (same applies + * to relminmxid and MultiXactCutoff). * * NB: We must use orig_rel_pages, not vacrel->rel_pages, since we want * the rel_pages used by lazy_scan_heap, which won't match when we * happened to truncate the relation afterwards. */ - if (vacrel->scanned_pages + vacrel->frozenskipped_pages < orig_rel_pages || - !vacrel->freeze_cutoffs_valid) + if (vacrel->scanned_pages + vacrel->frozenskipped_pages < orig_rel_pages) { /* Cannot advance relfrozenxid/relminmxid */ Assert(!aggressive); @@ -549,9 +551,16 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, { Assert(vacrel->scanned_pages + vacrel->frozenskipped_pages == orig_rel_pages); + Assert(!aggressive || + TransactionIdPrecedesOrEquals(FreezeLimit, + vacrel->NewRelfrozenXid)); + Assert(!aggressive || + MultiXactIdPrecedesOrEquals(MultiXactCutoff, + vacrel->NewRelminMxid)); + vac_update_relstats(rel, new_rel_pages, new_live_tuples, new_rel_allvisible, vacrel->nindexes > 0, - FreezeLimit, MultiXactCutoff, + vacrel->NewRelfrozenXid, vacrel->NewRelminMxid, &frozenxid_updated, &minmulti_updated, false); } @@ -656,17 +665,19 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, OldestXmin, diff); if (frozenxid_updated) { - diff = (int32) (FreezeLimit - vacrel->relfrozenxid); + diff = (int32) (vacrel->NewRelfrozenXid - vacrel->relfrozenxid); + Assert(diff > 0); appendStringInfo(&buf, _("new relfrozenxid: %u, which is %d xids ahead of previous value\n"), - FreezeLimit, diff); + vacrel->NewRelfrozenXid, diff); } if (minmulti_updated) { - diff = (int32) (MultiXactCutoff - vacrel->relminmxid); + diff = (int32) (vacrel->NewRelminMxid - vacrel->relminmxid); + Assert(diff > 0); appendStringInfo(&buf, _("new relminmxid: %u, which is %d mxids ahead of previous value\n"), - MultiXactCutoff, diff); + vacrel->NewRelminMxid, diff); } if (orig_rel_pages > 0) { @@ -1576,6 +1587,8 @@ lazy_scan_prune(LVRelState *vacrel, int nfrozen; OffsetNumber deadoffsets[MaxHeapTuplesPerPage]; xl_heap_freeze_tuple frozen[MaxHeapTuplesPerPage]; + TransactionId NewRelfrozenXid; + MultiXactId NewRelminMxid; Assert(BufferGetBlockNumber(buf) == blkno); @@ -1583,7 +1596,9 @@ lazy_scan_prune(LVRelState *vacrel, retry: - /* Initialize (or reset) page-level counters */ + /* Initialize (or reset) page-level state */ + NewRelfrozenXid = vacrel->NewRelfrozenXid; + NewRelminMxid = vacrel->NewRelminMxid; tuples_deleted = 0; lpdead_items = 0; live_tuples = 0; @@ -1791,7 +1806,9 @@ retry: vacrel->FreezeLimit, vacrel->MultiXactCutoff, &frozen[nfrozen], - &tuple_totally_frozen)) + &tuple_totally_frozen, + &NewRelfrozenXid, + &NewRelminMxid)) { /* Will execute freeze below */ frozen[nfrozen++].offset = offnum; @@ -1805,13 +1822,16 @@ retry: prunestate->all_frozen = false; } + vacrel->offnum = InvalidOffsetNumber; + /* * We have now divided every item on the page into either an LP_DEAD item * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple * that remains and needs to be considered for freezing now (LP_UNUSED and * LP_REDIRECT items also remain, but are of no further interest to us). */ - vacrel->offnum = InvalidOffsetNumber; + vacrel->NewRelfrozenXid = NewRelfrozenXid; + vacrel->NewRelminMxid = NewRelminMxid; /* * Consider the need to freeze any items with tuple storage from the page @@ -1962,6 +1982,8 @@ lazy_scan_noprune(LVRelState *vacrel, missed_dead_tuples; HeapTupleHeader tupleheader; OffsetNumber deadoffsets[MaxHeapTuplesPerPage]; + TransactionId NoFreezeNewRelfrozenXid = vacrel->NewRelfrozenXid; + MultiXactId NoFreezeNewRelminMxid = vacrel->NewRelminMxid; Assert(BufferGetBlockNumber(buf) == blkno); @@ -2007,20 +2029,56 @@ lazy_scan_noprune(LVRelState *vacrel, tupleheader = (HeapTupleHeader) PageGetItem(page, itemid); if (heap_tuple_needs_freeze(tupleheader, vacrel->FreezeLimit, - vacrel->MultiXactCutoff)) + vacrel->MultiXactCutoff, + &NoFreezeNewRelfrozenXid, + &NoFreezeNewRelminMxid)) { if (vacrel->aggressive) { - /* Going to have to get cleanup lock for lazy_scan_prune */ + /* + * heap_tuple_needs_freeze determined that it isn't going to + * be possible for the ongoing aggressive VACUUM operation to + * advance relfrozenxid to a value >= FreezeLimit without + * freezing one or more tuples with older XIDs from this page. + * (Or perhaps the issue was that MultiXactCutoff could not be + * respected. Might have even been both cutoffs, together.) + * + * Tell caller that it must acquire a full cleanup lock. It's + * possible that caller will have to wait a while for one, but + * that can't be helped -- full processing by lazy_scan_prune + * is required to freeze the older XIDs (and/or freeze older + * MultiXactIds). + * + * lazy_scan_prune expects a clean slate. Forget everything + * that lazy_scan_noprune learned about the page, including + * NewRelfrozenXid and NewRelminMxid tracking information. + */ vacrel->offnum = InvalidOffsetNumber; return false; } - - /* - * Current non-aggressive VACUUM operation definitely won't be - * able to advance relfrozenxid or relminmxid - */ - vacrel->freeze_cutoffs_valid = false; + else + { + /* + * This is a non-aggressive VACUUM, which is under no strict + * obligation to advance relfrozenxid at all (much less to + * advance it to a value >= FreezeLimit). Non-aggressive + * VACUUM advances relfrozenxid/relminmxid on a best-effort + * basis. It never waits for a cleanup lock. + * + * NewRelfrozenXid (and/or NewRelminMxid) will still have been + * ratcheted back as needed. heap_tuple_needs_freeze assumes + * that its caller _might_ prefer to carry on without freezing + * anything on the page in the event of a tuple containing an + * XID/MXID that "needs freezing". + * + * The fact that we won't be able to advance relfrozenxid up + * to FreezeLimit on this occasion is no reason to completely + * give up on advancing relfrozenxid. There is likely to be + * some benefit from advancing relfrozenxid by any amount, + * even if the final value is significantly older than our + * FreezeLimit. + */ + } } ItemPointerSet(&(tuple.t_self), blkno, offnum); @@ -2069,6 +2127,14 @@ lazy_scan_noprune(LVRelState *vacrel, vacrel->offnum = InvalidOffsetNumber; + /* + * We have committed to not freezing the tuples on this page (always + * happens with a non-aggressive VACUUM), so make sure that the target + * relfrozenxid/relminmxid values reflect the XIDs/MXIDs we encountered + */ + vacrel->NewRelfrozenXid = NoFreezeNewRelfrozenXid; + vacrel->NewRelminMxid = NoFreezeNewRelminMxid; + /* * Now save details of the LP_DEAD items from the page in vacrel (though * only when VACUUM uses two-pass strategy) diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 02a7e94bf..a7e988298 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -767,6 +767,7 @@ copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, TupleDesc oldTupDesc PG_USED_FOR_ASSERTS_ONLY; TupleDesc newTupDesc PG_USED_FOR_ASSERTS_ONLY; TransactionId OldestXmin; + MultiXactId oldestMxact; TransactionId FreezeXid; MultiXactId MultiXactCutoff; bool use_sort; @@ -856,8 +857,8 @@ copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, * Since we're going to rewrite the whole table anyway, there's no reason * not to be aggressive about this. */ - vacuum_set_xid_limits(OldHeap, 0, 0, 0, 0, - &OldestXmin, &FreezeXid, &MultiXactCutoff); + vacuum_set_xid_limits(OldHeap, 0, 0, 0, 0, &OldestXmin, &oldestMxact, + &FreezeXid, &MultiXactCutoff); /* * FreezeXid will become the table's new relfrozenxid, and that mustn't go diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 50a4a612e..0ae3b4506 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -945,14 +945,22 @@ get_all_vacuum_rels(int options) * The output parameters are: * - oldestXmin is the Xid below which tuples deleted by any xact (that * committed) should be considered DEAD, not just RECENTLY_DEAD. - * - freezeLimit is the Xid below which all Xids are replaced by - * FrozenTransactionId during vacuum. - * - multiXactCutoff is the value below which all MultiXactIds are removed - * from Xmax. + * - oldestMxact is the Mxid below which MultiXacts are definitely not + * seen as visible by any running transaction. + * - freezeLimit is the Xid below which all Xids are definitely replaced by + * FrozenTransactionId during aggressive vacuums. + * - multiXactCutoff is the value below which all MultiXactIds are definitely + * removed from Xmax during aggressive vacuums. * * Return value indicates if vacuumlazy.c caller should make its VACUUM * operation aggressive. An aggressive VACUUM must advance relfrozenxid up to - * FreezeLimit, and relminmxid up to multiXactCutoff. + * FreezeLimit (at a minimum), and relminmxid up to multiXactCutoff (at a + * minimum). + * + * oldestXmin and oldestMxact are the most recent values that can ever be + * passed to vac_update_relstats() as frozenxid and minmulti arguments by our + * vacuumlazy.c caller later on. These values should be passed when it turns + * out that VACUUM will leave no unfrozen XIDs/XMIDs behind in the table. */ bool vacuum_set_xid_limits(Relation rel, @@ -961,6 +969,7 @@ vacuum_set_xid_limits(Relation rel, int multixact_freeze_min_age, int multixact_freeze_table_age, TransactionId *oldestXmin, + MultiXactId *oldestMxact, TransactionId *freezeLimit, MultiXactId *multiXactCutoff) { @@ -969,7 +978,6 @@ vacuum_set_xid_limits(Relation rel, int effective_multixact_freeze_max_age; TransactionId limit; TransactionId safeLimit; - MultiXactId oldestMxact; MultiXactId mxactLimit; MultiXactId safeMxactLimit; int freezetable; @@ -1065,9 +1073,11 @@ vacuum_set_xid_limits(Relation rel, effective_multixact_freeze_max_age / 2); Assert(mxid_freezemin >= 0); + /* Remember for caller */ + *oldestMxact = GetOldestMultiXactId(); + /* compute the cutoff multi, being careful to generate a valid value */ - oldestMxact = GetOldestMultiXactId(); - mxactLimit = oldestMxact - mxid_freezemin; + mxactLimit = *oldestMxact - mxid_freezemin; if (mxactLimit < FirstMultiXactId) mxactLimit = FirstMultiXactId; @@ -1082,8 +1092,8 @@ vacuum_set_xid_limits(Relation rel, (errmsg("oldest multixact is far in the past"), errhint("Close open transactions with multixacts soon to avoid wraparound problems."))); /* Use the safe limit, unless an older mxact is still running */ - if (MultiXactIdPrecedes(oldestMxact, safeMxactLimit)) - mxactLimit = oldestMxact; + if (MultiXactIdPrecedes(*oldestMxact, safeMxactLimit)) + mxactLimit = *oldestMxact; else mxactLimit = safeMxactLimit; } @@ -1390,14 +1400,10 @@ vac_update_relstats(Relation relation, * Update relfrozenxid, unless caller passed InvalidTransactionId * indicating it has no new data. * - * Ordinarily, we don't let relfrozenxid go backwards: if things are - * working correctly, the only way the new frozenxid could be older would - * be if a previous VACUUM was done with a tighter freeze_min_age, in - * which case we don't want to forget the work it already did. However, - * if the stored relfrozenxid is "in the future", then it must be corrupt - * and it seems best to overwrite it with the cutoff we used this time. - * This should match vac_update_datfrozenxid() concerning what we consider - * to be "in the future". + * Ordinarily, we don't let relfrozenxid go backwards. However, if the + * stored relfrozenxid is "in the future", then it must be corrupt, so + * just overwrite it. This should match vac_update_datfrozenxid() + * concerning what we consider to be "in the future". */ if (frozenxid_updated) *frozenxid_updated = false; -- 2.30.2