From d10f42a1c091b4dc52670fca80a63fee4e73e20c Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Mon, 13 Dec 2021 15:00:49 -0800 Subject: [PATCH v9 2/4] Make page-level characteristics drive freezing. Teach VACUUM to freeze all of the tuples on a page whenever it notices that it would otherwise mark the page all-visible, without also marking it all-frozen. VACUUM typically won't freeze _any_ tuples on the page unless _all_ tuples (that remain after pruning) are all-visible. This makes the overhead of vacuuming much more predictable over time. We avoid the need for large balloon payments during aggressive VACUUMs (typically anti-wraparound autovacuums). Freezing is proactive, so we're much less likely to get into "freezing debt". The new approach to freezing also enables relfrozenxid advancement in non-aggressive VACUUMs, which might be enough to avoid aggressive VACUUMs altogether (with many individual tables/workloads). While the non-aggressive case continues to skip all-visible (but not all-frozen) pages (thereby making relfrozenxid advancement impossible), that in itself will no longer hinder relfrozenxid advancement (outside of pg_upgrade scenarios). We now consistently avoid leaving behind all-visible (not all-frozen) pages. This (as well as work from commit 44fa84881f) makes relfrozenxid advancement in non-aggressive VACUUMs commonplace. There is also a clear disadvantage to the new approach to freezing: more eager freezing will impose overhead on cases that don't receive any benefit. This is considered an acceptable trade-off. The new algorithm tends to avoid freezing early on pages where it makes the least sense, since frequently modified pages are unlikely to be all-visible. The system accumulates freezing debt in proportion to the number of physical heap pages with unfrozen tuples, more or less. Anything based on XID age is likely to be a poor proxy for the eventual cost of freezing (during the inevitable anti-wraparound autovacuum). At a high level, freezing is now treated as one of the costs of storing tuples in physical heap pages -- not a cost of transactions that allocate XIDs. Although vacuum_freeze_min_age and vacuum_multixact_freeze_min_age still influence what we freeze, and when, they effectively become backstops. It may still be necessary to "freeze a page" due to the presence of a particularly old XID, from before VACUUM's FreezeLimit cutoff, though that will be rare in practice -- FreezeLimit is just a backstop now. It can only _trigger_ page-level freezing now. All XIDs < OldestXmin and all MXIDs < OldestMxact will now be frozen on any page that VACUUM decides to freeze, regardless of the details behind its decision. The autovacuum logging instrumentation (and VACUUM VERBOSE) now display the number of pages that were "newly frozen". This new metric will give users a general sense of how much freezing VACUUM performed. It tends to be fairly predictable (as a percentage of rel_pages) for a given table and workload. Author: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com --- src/include/access/heapam_xlog.h | 7 ++- src/backend/access/heap/heapam.c | 89 ++++++++++++++++++++++++---- src/backend/access/heap/vacuumlazy.c | 88 ++++++++++++++++++++------- src/backend/commands/vacuum.c | 8 +++ 4 files changed, 158 insertions(+), 34 deletions(-) diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h index 2d8a7f627..a58226e54 100644 --- a/src/include/access/heapam_xlog.h +++ b/src/include/access/heapam_xlog.h @@ -409,10 +409,15 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId relminmxid, TransactionId cutoff_xid, TransactionId cutoff_multi, + TransactionId backstop_cutoff_xid, + MultiXactId backstop_cutoff_multi, xl_heap_freeze_tuple *frz, bool *totally_frozen, + bool *force_freeze, TransactionId *relfrozenxid_out, - MultiXactId *relminmxid_out); + MultiXactId *relminmxid_out, + TransactionId *relfrozenxid_nofreeze_out, + MultiXactId *relminmxid_nofreeze_out); extern void heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *xlrec_tp); extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer, diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 134bc408a..05253e8dd 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6439,14 +6439,38 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * are older than the specified cutoff XID and cutoff MultiXactId. If so, * setup enough state (in the *frz output argument) to later execute and * WAL-log what we would need to do, and return true. Return false if nothing - * is to be changed. In addition, set *totally_frozen_p to true if the tuple + * can be changed. In addition, set *totally_frozen_p to true if the tuple * will be totally frozen after these operations are performed and false if * more freezing will eventually be required. * + * Although this interface is primarily tuple-based, vacuumlazy.c caller + * cooperates with us to decide on whether or not to freeze whole pages, + * together as a single group. We prepare for freezing at the level of each + * tuple, but the final decision is made for the page as a whole. All pages + * that are frozen within a given VACUUM operation are frozen according to + * cutoff_xid and cutoff_multi. Caller _must_ freeze the whole page when + * we've set *force_freeze to true! + * + * cutoff_xid must be caller's oldest xmin to ensure that any XID older than + * it could neither be running nor seen as running by any open transaction. + * This ensures that the replacement will not change anyone's idea of the + * tuple state. Similarly, cutoff_multi must be the smallest MultiXactId used + * by any open transaction (at the time that the oldest xmin was acquired). + * + * backstop_cutoff_xid must be <= cutoff_xid, and backstop_cutoff_multi must + * be <= cutoff_multi. When any XID/XMID from before these backstop cutoffs + * is encountered, we set *force_freeze to true, making caller freeze the page + * (freezing-eligible XIDs/XMIDs will be frozen, at least). "Backstop + * freezing" ensures that VACUUM won't allow XIDs/XMIDs to ever get too old. + * This shouldn't be necessary very often. VACUUM should prefer to freeze + * when it's cheap (not when it's urgent). + * * Maintains *relfrozenxid_out and *relminmxid_out, which are the current - * target relfrozenxid and relminmxid for the relation. Caller should make - * temp copies of global tracking variables before starting to process a page, - * so that we can only scribble on copies. + * target relfrozenxid and relminmxid for the relation. There are also "no + * freeze" variants (*relfrozenxid_nofreeze_out and *relminmxid_nofreeze_out) + * that are used by caller when it decides to not freeze the page. Caller + * should make temp copies of global tracking variables before starting to + * process a page, so that we can only scribble on copies. * * Caller is responsible for setting the offset field, if appropriate. * @@ -6454,13 +6478,6 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD * (else we should be removing the tuple, not freezing it). * - * NB: cutoff_xid *must* be <= the current global xmin, to ensure that any - * XID older than it could neither be running nor seen as running by any - * open transaction. This ensures that the replacement will not change - * anyone's idea of the tuple state. - * Similarly, cutoff_multi must be less than or equal to the smallest - * MultiXactId used by any transaction currently open. - * * If the tuple is in a shared buffer, caller must hold an exclusive lock on * that buffer. * @@ -6472,12 +6489,18 @@ bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId relfrozenxid, TransactionId relminmxid, TransactionId cutoff_xid, TransactionId cutoff_multi, + TransactionId backstop_cutoff_xid, + MultiXactId backstop_cutoff_multi, xl_heap_freeze_tuple *frz, bool *totally_frozen_p, + bool *force_freeze, TransactionId *relfrozenxid_out, - MultiXactId *relminmxid_out) + MultiXactId *relminmxid_out, + TransactionId *relfrozenxid_nofreeze_out, + MultiXactId *relminmxid_nofreeze_out) { bool changed = false; + bool xmin_already_frozen = false; bool xmax_already_frozen = false; bool xmin_frozen; bool freeze_xmax; @@ -6498,7 +6521,10 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, */ xid = HeapTupleHeaderGetXmin(tuple); if (!TransactionIdIsNormal(xid)) + { + xmin_already_frozen = true; xmin_frozen = true; + } else { if (TransactionIdPrecedes(xid, relfrozenxid)) @@ -6564,6 +6590,13 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, frz->t_infomask |= HEAP_XMAX_COMMITTED; changed = true; + /* + * Have caller freeze the page, since setting this MultiXactId to + * a simple XID has some value. Long-lived MultiXacts should be + * avoided. + */ + *force_freeze = true; + if (TransactionIdPrecedes(newxmax, *relfrozenxid_out)) { /* New xmax is an XID older than new relfrozenxid_out */ @@ -6609,6 +6642,12 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, */ if (TransactionIdPrecedes(temp, *relfrozenxid_out)) *relfrozenxid_out = temp; + + /* + * We allocated a MultiXact for this, so force freezing to avoid + * wasting it + */ + *force_freeze = true; } } else if (TransactionIdIsNormal(xid)) @@ -6713,11 +6752,28 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, Assert(!(tuple->t_infomask & HEAP_XMIN_INVALID)); frz->t_infomask |= HEAP_XMIN_COMMITTED; changed = true; + + /* Seems like a good idea to freeze early when this case is hit */ + *force_freeze = true; } } *totally_frozen_p = (xmin_frozen && (freeze_xmax || xmax_already_frozen)); + + /* + * Maintain alternative versions of relfrozenxid_out/relminmxid_out that + * leave caller with the option of *not* freezing the page. If caller has + * already lost that option (e.g. when the page has an old XID that + * requires backstop freezing), then we don't waste time on this. + */ + if (!*force_freeze && (!xmin_already_frozen || !xmax_already_frozen)) + *force_freeze = heap_tuple_needs_freeze(tuple, + backstop_cutoff_xid, + backstop_cutoff_multi, + relfrozenxid_nofreeze_out, + relminmxid_nofreeze_out); + return changed; } @@ -6769,15 +6825,22 @@ heap_freeze_tuple(HeapTupleHeader tuple, { xl_heap_freeze_tuple frz; bool do_freeze; + bool force_freeze = true; bool tuple_totally_frozen; TransactionId relfrozenxid_out = cutoff_xid; MultiXactId relminmxid_out = cutoff_multi; + TransactionId relfrozenxid_nofreeze_out = cutoff_xid; + MultiXactId relminmxid_nofreeze_out = cutoff_multi; do_freeze = heap_prepare_freeze_tuple(tuple, relfrozenxid, relminmxid, cutoff_xid, cutoff_multi, + cutoff_xid, cutoff_multi, &frz, &tuple_totally_frozen, - &relfrozenxid_out, &relminmxid_out); + &force_freeze, + &relfrozenxid_out, &relminmxid_out, + &relfrozenxid_nofreeze_out, + &relminmxid_nofreeze_out); /* * Note that because this is not a WAL-logged operation, we don't need to diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 6ebb9c520..f14b64dfc 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -167,9 +167,10 @@ typedef struct LVRelState MultiXactId relminmxid; double old_live_tuples; /* previous value of pg_class.reltuples */ - /* VACUUM operation's cutoff for pruning */ + /* Cutoffs for freezing eligibility */ TransactionId OldestXmin; - /* VACUUM operation's cutoff for freezing XIDs and MultiXactIds */ + MultiXactId OldestMxact; + /* Backstop cutoffs that force freezing of older XIDs/MXIDs */ TransactionId FreezeLimit; MultiXactId MultiXactCutoff; /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */ @@ -199,6 +200,7 @@ typedef struct LVRelState BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */ BlockNumber frozenskipped_pages; /* # frozen pages skipped via VM */ BlockNumber removed_pages; /* # pages removed by relation truncation */ + BlockNumber newly_frozen_pages; /* # pages with tuples frozen by us */ BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */ BlockNumber missed_dead_pages; /* # pages with missed dead tuples */ BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */ @@ -470,8 +472,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->relminmxid = rel->rd_rel->relminmxid; vacrel->old_live_tuples = rel->rd_rel->reltuples; - /* Set cutoffs for entire VACUUM */ + /* Initialize freezing cutoffs */ vacrel->OldestXmin = OldestXmin; + vacrel->OldestMxact = OldestMxact; vacrel->FreezeLimit = FreezeLimit; vacrel->MultiXactCutoff = MultiXactCutoff; /* Initialize state used to track oldest extant XID/XMID */ @@ -643,12 +646,15 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->relnamespace, vacrel->relname, vacrel->num_index_scans); - appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"), + appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total), %u newly frozen (%.2f%% of total)\n"), vacrel->removed_pages, vacrel->rel_pages, vacrel->scanned_pages, orig_rel_pages == 0 ? 0 : - 100.0 * vacrel->scanned_pages / orig_rel_pages); + 100.0 * vacrel->scanned_pages / orig_rel_pages, + vacrel->newly_frozen_pages, + orig_rel_pages == 0 ? 0 : + 100.0 * vacrel->newly_frozen_pages / orig_rel_pages); appendStringInfo(&buf, _("tuples: %lld removed, %lld remain, %lld are dead but not yet removable\n"), (long long) vacrel->tuples_deleted, @@ -818,6 +824,7 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers) vacrel->scanned_pages = 0; vacrel->frozenskipped_pages = 0; vacrel->removed_pages = 0; + vacrel->newly_frozen_pages = 0; vacrel->lpdead_item_pages = 0; vacrel->missed_dead_pages = 0; vacrel->nonempty_pages = 0; @@ -873,7 +880,10 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers) * When vacrel->aggressive is set, we can't skip pages just because they * are all-visible, but we can still skip pages that are all-frozen, since * such pages do not need freezing and do not affect the value that we can - * safely set for relfrozenxid or relminmxid. + * safely set for relfrozenxid or relminmxid. Pages that are set to + * all-visible but not also set to all-frozen are generally only expected + * in pg_upgrade scenarios (these days lazy_scan_prune freezes all of the + * tuples on a page when the page as a whole will be marked all-visible). * * Before entering the main loop, establish the invariant that * next_unskippable_block is the next block number >= blkno that we can't @@ -1017,7 +1027,7 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers) /* * SKIP_PAGES_THRESHOLD (threshold for skipping) was not * crossed, or this is the last page. Scan the page, even - * though it's all-visible (and possibly even all-frozen). + * though it's all-visible (and likely all-frozen, too). */ all_visible_according_to_vm = true; } @@ -1585,10 +1595,13 @@ lazy_scan_prune(LVRelState *vacrel, recently_dead_tuples; int nnewlpdead; int nfrozen; + bool force_freeze = false; OffsetNumber deadoffsets[MaxHeapTuplesPerPage]; xl_heap_freeze_tuple frozen[MaxHeapTuplesPerPage]; - TransactionId NewRelfrozenXid; - MultiXactId NewRelminMxid; + TransactionId NewRelfrozenXid, + NoFreezeNewRelfrozenXid; + MultiXactId NewRelminMxid, + NoFreezeNewRelminMxid; Assert(BufferGetBlockNumber(buf) == blkno); @@ -1597,8 +1610,8 @@ lazy_scan_prune(LVRelState *vacrel, retry: /* Initialize (or reset) page-level state */ - NewRelfrozenXid = vacrel->NewRelfrozenXid; - NewRelminMxid = vacrel->NewRelminMxid; + NewRelfrozenXid = NoFreezeNewRelfrozenXid = vacrel->NewRelfrozenXid; + NewRelminMxid = NoFreezeNewRelminMxid = vacrel->NewRelminMxid; tuples_deleted = 0; lpdead_items = 0; live_tuples = 0; @@ -1669,8 +1682,15 @@ retry: */ if (ItemIdIsDead(itemid)) { + /* + * We delay setting all_visible to false in the event of seeing an + * LP_DEAD item. We need to test "is the page all_visible if we + * just consider remaining tuples with tuple storage?" below, when + * considering if we want to freeze the page. We set all_visible + * to false for our caller last, when doing final processing of + * any LP_DEAD items collected here. + */ deadoffsets[lpdead_items++] = offnum; - prunestate->all_visible = false; prunestate->has_lpdead_items = true; continue; } @@ -1803,12 +1823,17 @@ retry: if (heap_prepare_freeze_tuple(tuple.t_data, vacrel->relfrozenxid, vacrel->relminmxid, + vacrel->OldestXmin, + vacrel->OldestMxact, vacrel->FreezeLimit, vacrel->MultiXactCutoff, &frozen[nfrozen], &tuple_totally_frozen, + &force_freeze, &NewRelfrozenXid, - &NewRelminMxid)) + &NewRelminMxid, + &NoFreezeNewRelfrozenXid, + &NoFreezeNewRelminMxid)) { /* Will execute freeze below */ frozen[nfrozen++].offset = offnum; @@ -1829,9 +1854,31 @@ retry: * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple * that remains and needs to be considered for freezing now (LP_UNUSED and * LP_REDIRECT items also remain, but are of no further interest to us). + * + * Freeze the page (based on heap_prepare_freeze_tuple's instructions) + * when it is about to become all-visible. Also freeze in cases where + * heap_prepare_freeze_tuple requires it. This usually happens due to the + * presence of an old XID from before FreezeLimit. */ - vacrel->NewRelfrozenXid = NewRelfrozenXid; - vacrel->NewRelminMxid = NewRelminMxid; + if (prunestate->all_visible || force_freeze) + { + /* + * We're freezing the page. Our final NewRelfrozenXid doesn't need to + * be affected by the XIDs/XMIDs that are just about to be frozen + * anyway. + */ + vacrel->NewRelfrozenXid = NewRelfrozenXid; + vacrel->NewRelminMxid = NewRelminMxid; + } + else + { + /* This is comparable to lazy_scan_noprune's handling */ + vacrel->NewRelfrozenXid = NoFreezeNewRelfrozenXid; + vacrel->NewRelminMxid = NoFreezeNewRelminMxid; + + /* Forget heap_prepare_freeze_tuple's guidance on freezing */ + nfrozen = 0; + } /* * Consider the need to freeze any items with tuple storage from the page @@ -1839,7 +1886,7 @@ retry: */ if (nfrozen > 0) { - Assert(prunestate->hastup); + vacrel->newly_frozen_pages++; /* * At least one tuple with storage needs to be frozen -- execute that @@ -1869,7 +1916,7 @@ retry: { XLogRecPtr recptr; - recptr = log_heap_freeze(vacrel->rel, buf, vacrel->FreezeLimit, + recptr = log_heap_freeze(vacrel->rel, buf, NewRelfrozenXid, frozen, nfrozen); PageSetLSN(page, recptr); } @@ -1892,7 +1939,7 @@ retry: */ #ifdef USE_ASSERT_CHECKING /* Note that all_frozen value does not matter when !all_visible */ - if (prunestate->all_visible) + if (prunestate->all_visible && lpdead_items == 0) { TransactionId cutoff; bool all_frozen; @@ -1900,7 +1947,6 @@ retry: if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen)) Assert(false); - Assert(lpdead_items == 0); Assert(prunestate->all_frozen == all_frozen); /* @@ -1922,9 +1968,11 @@ retry: VacDeadItems *dead_items = vacrel->dead_items; ItemPointerData tmp; - Assert(!prunestate->all_visible); Assert(prunestate->has_lpdead_items); + /* Caller expects LP_DEAD items to unset all_visible */ + prunestate->all_visible = false; + vacrel->lpdead_item_pages++; ItemPointerSetBlockNumber(&tmp, blkno); diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 0ae3b4506..514658ba0 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -957,6 +957,14 @@ get_all_vacuum_rels(int options) * FreezeLimit (at a minimum), and relminmxid up to multiXactCutoff (at a * minimum). * + * While non-aggressive VACUUMs are never required to advance relfrozenxid and + * relminmxid, they often do so in practice. They freeze wherever possible, + * based on the same criteria that aggressive VACUUMs use. FreezeLimit and + * multiXactCutoff are still applied as backstop cutoffs, that force freezing + * of older XIDs/XMIDs that did not get frozen based on the standard criteria. + * (Actually, the backstop cutoffs won't force freezing in rare cases where a + * cleanup lock cannot be acquired on a page during a non-aggressive VACUUM.) + * * oldestXmin and oldestMxact are the most recent values that can ever be * passed to vac_update_relstats() as frozenxid and minmulti arguments by our * vacuumlazy.c caller later on. These values should be passed when it turns -- 2.30.2