From 064cfa0b489a2c76dd8b527e119ca5c4658de295 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Sat, 22 Apr 2023 12:33:42 -0700 Subject: [PATCH v4 8/9] Overhaul "Recovering Disk Space" vacuuming docs. XXX This commit is much less worked out and polished than the work on freezing. It should very much be considered a work in progress, and isn't the priority for now. Say a lot more about the possible impact of long-running transactions on VACUUM. Remove all talk of administrators getting by without autovacuum; at most administrators might want to schedule manual VACUUM operations to supplement autovacuum (this documentation was written at a time when the visibility map didn't exist, even in its most basic form). Also describe VACUUM FULL as an entirely different kind of operation to conventional lazy vacuum. XXX Open question for this commit: I wonder if it would make sense to move all of that stuff into its own new sect1 of "Chapter 29. Monitoring Disk Usage" -- something along the lines of "what to do about bloat when all else fails, when the problem gets completely out of hand". Naturally we'd link to this new section from "Routine Vacuuming". XXX For now, a lot of the information about CLUSTER and VACUUM FULL is moved into Note/Warning boxes. This arrangement is definitely going to be temporary. --- doc/src/sgml/maintenance.sgml | 165 ++++++++++++++++++---------------- 1 file changed, 87 insertions(+), 78 deletions(-) diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index db8c5724e..f00442564 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -372,100 +372,109 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu This approach is necessary to gain the benefits of multiversion concurrency control (MVCC, see ): the row version must not be deleted while it is still potentially visible to other - transactions. But eventually, an outdated or deleted row version is no - longer of interest to any transaction. The space it occupies must then be - reclaimed for reuse by new rows, to avoid unbounded growth of disk - space requirements. This is done by running VACUUM. + transactions. A deleted row version (whether from an + UPDATE or DELETE) will usually cease + to be of interest to any still-running transaction shortly after the + original deleting transaction commits. - The standard form of VACUUM removes dead row - versions in tables and indexes and marks the space available for - future reuse. However, it will not return the space to the operating - system, except in the special case where one or more pages at the - end of a table become entirely free and an exclusive table lock can be - easily obtained. In contrast, VACUUM FULL actively compacts - tables by writing a complete new version of the table file with no dead - space. This minimizes the size of the table, but can take a long time. - It also requires extra disk space for the new copy of the table, until - the operation completes. + The space dead tuples occupy must eventually be reclaimed for reuse by new + rows, to avoid unbounded growth of disk space requirements. Reclaiming + space from dead rows is VACUUM's main responsibility. - The usual goal of routine vacuuming is to do standard VACUUMs - often enough to avoid needing VACUUM FULL. The - autovacuum daemon attempts to work this way, and in fact will - never issue VACUUM FULL. In this approach, the idea - is not to keep tables at their minimum size, but to maintain steady-state - usage of disk space: each table occupies space equivalent to its - minimum size plus however much space gets used up between vacuum runs. - Although VACUUM FULL can be used to shrink a table back - to its minimum size and return the disk space to the operating system, - there is not much point in this if the table will just grow again in the - future. Thus, moderately-frequent standard VACUUM runs are a - better approach than infrequent VACUUM FULL runs for - maintaining heavily-updated tables. + The transaction ID number + (XID) based cutoff point that + VACUUM uses to determine if a deleted tuple is safe to + physically remove is reported under removable cutoff in + the server log when autovacuum logging (controlled by ) reports on a + VACUUM operation executed by autovacuum. Tuples that + are not yet safe to remove are counted as dead but not yet + removable tuples in the log report. VACUUM + establishes its removable cutoff once, at the start of + the operation. Any older MVCC snapshot (or transaction + that allocates an XID) that's still running when the cutoff is established + may hold it back. - - Some administrators prefer to schedule vacuuming themselves, for example - doing all the work at night when load is low. - The difficulty with doing vacuuming according to a fixed schedule - is that if a table has an unexpected spike in update activity, it may - get bloated to the point that VACUUM FULL is really necessary - to reclaim space. Using the autovacuum daemon alleviates this problem, - since the daemon schedules vacuuming dynamically in response to update - activity. It is unwise to disable the daemon completely unless you - have an extremely predictable workload. One possible compromise is - to set the daemon's parameters so that it will only react to unusually - heavy update activity, thus keeping things from getting out of hand, - while scheduled VACUUMs are expected to do the bulk of the - work when the load is typical. - + + + It's critical that no long-running transactions are allowed to hold back + every VACUUM operation's cutoff for an extended + period. It may be a good idea to add monitoring to alert you about this. + + + + + + VACUUM can remove tuples inserted by aborted + transactions immediately + + - For those not using autovacuum, a typical approach is to schedule a - database-wide VACUUM once a day during a low-usage period, - supplemented by more frequent vacuuming of heavily-updated tables as - necessary. (Some installations with extremely high update rates vacuum - their busiest tables as often as once every few minutes.) If you have - multiple databases in a cluster, don't forget to - VACUUM each one; the program might be helpful. + VACUUM usually doesn't return space to the operating + system. There is one exception: space is returned to the OS whenever a + group of contiguous pages appears at the end of a table. + VACUUM must acquire an ACCESS + EXCLUSIVE lock to perform relation truncation. You can disable + relation truncation by setting the table's + vacuum_truncate storage parameter to + off. - - Plain VACUUM may not be satisfactory when - a table contains large numbers of dead row versions as a result of - massive update or delete activity. If you have such a table and - you need to reclaim the excess disk space it occupies, you will need - to use VACUUM FULL, or alternatively - CLUSTER - or one of the table-rewriting variants of - ALTER TABLE. - These commands rewrite an entire new copy of the table and build - new indexes for it. All these options require an - ACCESS EXCLUSIVE lock. Note that - they also temporarily use extra disk space approximately equal to the size - of the table, since the old copies of the table and indexes can't be - released until the new ones are complete. - + + If you have a table whose entire contents are deleted periodically, + consider using TRUNCATE rather than + DELETE. TRUNCATE removes the entire + table's contents immediately, obviating the need for + VACUUM. One disadvantage is that strict + MVCC semantics are violated. + - - - If you have a table whose entire contents are deleted on a periodic - basis, consider doing it with - TRUNCATE rather - than using DELETE followed by - VACUUM. TRUNCATE removes the - entire content of the table immediately, without requiring a - subsequent VACUUM or VACUUM - FULL to reclaim the now-unused disk space. - The disadvantage is that strict MVCC semantics are violated. - + + VACUUM FULL (or CLUSTER) can be + useful when dealing with extreme amounts of dead tuples. It can reclaim + more disk space, but it is much slower, and usually more disruptive. + VACUUM FULL rewrites an entire new copy of the table + and rebuilds all of the table's indexes. This makes it suitable for + highly fragmented tables, and tables where significant amounts of space + can be reclaimed. + + + + Although VACUUM FULL is technically an option of the + VACUUM command, VACUUM FULL uses a + completely different implementation. VACUUM FULL is + essentially a variant of CLUSTER. (The name + VACUUM FULL is historical; the original implementation + was closer to standard VACUUM.) + + + + + TRUNCATE, VACUUM FULL, and + CLUSTER all require an ACCESS + EXCLUSIVE lock, which can be highly disruptive + (SELECT, INSERT, + UPDATE, and DELETE commands can't + run at the same time). + + + + + VACUUM FULL and CLUSTER temporarily + use extra disk space. The extra space required is approximately equal to + the size of the table, since the old copies of the table and indexes + can't be released until the new ones are complete. + + -- 2.40.1