From de7e1aaead4a8f8b4a680b7489a35fedad8059d2 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Sat, 22 Apr 2023 11:19:50 -0700 Subject: [PATCH v4 4/9] Reorder routine vacuuming sections. This doesn't change any of the content itself. It is a mechanical change. The new order talks about maintenance tasks that happen within the scope of the VACUUM command first, and then talks about ANALYZE last. Furthermore, we talk about each maintenance task that happens within the scope of VACUUM in an order that matches physical processing order within vacuumlazy.c. (If you assume that "space-recovery" mostly deals with pruning and "for-wraparound" mostly deals with freezing). Old order: New order: A later commit will make the content that now appears in "vacuum-basics" appear as the "Routine Vacuuming" sect1's introductory paragraph. That'll make it easier to move advice about when to use VACUUM FULL to some other chapter (since it isn't intended for "routine" use at all). The new order will be easier to work with in later commits that overhaul both "space-recovery" and "for-wraparound". Pruning and freezing are related conceptually (e.g., holding back "removable cutoff"/OldestXmin disrupts both in about the same way), which will be easier to discuss with this ground work in place. --- doc/src/sgml/maintenance.sgml | 300 +++++++++++++++++----------------- 1 file changed, 150 insertions(+), 150 deletions(-) diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index 702e2797c..83fa7ba8b 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -280,8 +280,9 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu - To update data statistics used by the - PostgreSQL query planner. + To protect against loss of very old data due to + transaction ID wraparound or + multixact ID wraparound. @@ -291,9 +292,8 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu - To protect against loss of very old data due to - transaction ID wraparound or - multixact ID wraparound. + To update data statistics used by the + PostgreSQL query planner. @@ -438,151 +438,6 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu - - Updating Planner Statistics - - - statistics - of the planner - - - - ANALYZE - - - - The PostgreSQL query planner relies on - statistical information about the contents of tables in order to - generate good plans for queries. These statistics are gathered by - the ANALYZE command, - which can be invoked by itself or - as an optional step in VACUUM. It is important to have - reasonably accurate statistics, otherwise poor choices of plans might - degrade database performance. - - - - The autovacuum daemon, if enabled, will automatically issue - ANALYZE commands whenever the content of a table has - changed sufficiently. However, administrators might prefer to rely - on manually-scheduled ANALYZE operations, particularly - if it is known that update activity on a table will not affect the - statistics of interesting columns. The daemon schedules - ANALYZE strictly as a function of the number of rows - inserted or updated; it has no knowledge of whether that will lead - to meaningful statistical changes. - - - - Tuples changed in partitions and inheritance children do not trigger - analyze on the parent table. If the parent table is empty or rarely - changed, it may never be processed by autovacuum, and the statistics for - the inheritance tree as a whole won't be collected. It is necessary to - run ANALYZE on the parent table manually in order to - keep the statistics up to date. - - - - As with vacuuming for space recovery, frequent updates of statistics - are more useful for heavily-updated tables than for seldom-updated - ones. But even for a heavily-updated table, there might be no need for - statistics updates if the statistical distribution of the data is - not changing much. A simple rule of thumb is to think about how much - the minimum and maximum values of the columns in the table change. - For example, a timestamp column that contains the time - of row update will have a constantly-increasing maximum value as - rows are added and updated; such a column will probably need more - frequent statistics updates than, say, a column containing URLs for - pages accessed on a website. The URL column might receive changes just - as often, but the statistical distribution of its values probably - changes relatively slowly. - - - - It is possible to run ANALYZE on specific tables and even - just specific columns of a table, so the flexibility exists to update some - statistics more frequently than others if your application requires it. - In practice, however, it is usually best to just analyze the entire - database, because it is a fast operation. ANALYZE uses a - statistically random sampling of the rows of a table rather than reading - every single row. - - - - - Although per-column tweaking of ANALYZE frequency might not be - very productive, you might find it worthwhile to do per-column - adjustment of the level of detail of the statistics collected by - ANALYZE. Columns that are heavily used in WHERE - clauses and have highly irregular data distributions might require a - finer-grain data histogram than other columns. See ALTER TABLE - SET STATISTICS, or change the database-wide default using the configuration parameter. - - - - Also, by default there is limited information available about - the selectivity of functions. However, if you create a statistics - object or an expression - index that uses a function call, useful statistics will be - gathered about the function, which can greatly improve query - plans that use the expression index. - - - - - - The autovacuum daemon does not issue ANALYZE commands for - foreign tables, since it has no means of determining how often that - might be useful. If your queries require statistics on foreign tables - for proper planning, it's a good idea to run manually-managed - ANALYZE commands on those tables on a suitable schedule. - - - - - - The autovacuum daemon does not issue ANALYZE commands - for partitioned tables. Inheritance parents will only be analyzed if the - parent itself is changed - changes to child tables do not trigger - autoanalyze on the parent table. If your queries require statistics on - parent tables for proper planning, it is necessary to periodically run - a manual ANALYZE on those tables to keep the statistics - up to date. - - - - - - - Updating the Visibility Map - - - Vacuum maintains a visibility map for each - table to keep track of which pages contain only tuples that are known to be - visible to all active transactions (and all future transactions, until the - page is again modified). This has two purposes. First, vacuum - itself can skip such pages on the next run, since there is nothing to - clean up. - - - - Second, it allows PostgreSQL to answer some - queries using only the index, without reference to the underlying table. - Since PostgreSQL indexes don't contain tuple - visibility information, a normal index scan fetches the heap tuple for each - matching index entry, to check whether it should be seen by the current - transaction. - An index-only - scan, on the other hand, checks the visibility map first. - If it's known that all tuples on the page are - visible, the heap fetch can be skipped. This is most useful on - large data sets where the visibility map can prevent disk accesses. - The visibility map is vastly smaller than the heap, so it can easily be - cached even when the heap is very large. - - - Preventing Transaction ID Wraparound Failures @@ -932,6 +787,151 @@ HINT: Stop the postmaster and vacuum that database in single-user mode. + + + Updating the Visibility Map + + + Vacuum maintains a visibility map for each + table to keep track of which pages contain only tuples that are known to be + visible to all active transactions (and all future transactions, until the + page is again modified). This has two purposes. First, vacuum + itself can skip such pages on the next run, since there is nothing to + clean up. + + + + Second, it allows PostgreSQL to answer some + queries using only the index, without reference to the underlying table. + Since PostgreSQL indexes don't contain tuple + visibility information, a normal index scan fetches the heap tuple for each + matching index entry, to check whether it should be seen by the current + transaction. + An index-only + scan, on the other hand, checks the visibility map first. + If it's known that all tuples on the page are + visible, the heap fetch can be skipped. This is most useful on + large data sets where the visibility map can prevent disk accesses. + The visibility map is vastly smaller than the heap, so it can easily be + cached even when the heap is very large. + + + + + Updating Planner Statistics + + + statistics + of the planner + + + + ANALYZE + + + + The PostgreSQL query planner relies on + statistical information about the contents of tables in order to + generate good plans for queries. These statistics are gathered by + the ANALYZE command, + which can be invoked by itself or + as an optional step in VACUUM. It is important to have + reasonably accurate statistics, otherwise poor choices of plans might + degrade database performance. + + + + The autovacuum daemon, if enabled, will automatically issue + ANALYZE commands whenever the content of a table has + changed sufficiently. However, administrators might prefer to rely + on manually-scheduled ANALYZE operations, particularly + if it is known that update activity on a table will not affect the + statistics of interesting columns. The daemon schedules + ANALYZE strictly as a function of the number of rows + inserted or updated; it has no knowledge of whether that will lead + to meaningful statistical changes. + + + + Tuples changed in partitions and inheritance children do not trigger + analyze on the parent table. If the parent table is empty or rarely + changed, it may never be processed by autovacuum, and the statistics for + the inheritance tree as a whole won't be collected. It is necessary to + run ANALYZE on the parent table manually in order to + keep the statistics up to date. + + + + As with vacuuming for space recovery, frequent updates of statistics + are more useful for heavily-updated tables than for seldom-updated + ones. But even for a heavily-updated table, there might be no need for + statistics updates if the statistical distribution of the data is + not changing much. A simple rule of thumb is to think about how much + the minimum and maximum values of the columns in the table change. + For example, a timestamp column that contains the time + of row update will have a constantly-increasing maximum value as + rows are added and updated; such a column will probably need more + frequent statistics updates than, say, a column containing URLs for + pages accessed on a website. The URL column might receive changes just + as often, but the statistical distribution of its values probably + changes relatively slowly. + + + + It is possible to run ANALYZE on specific tables and even + just specific columns of a table, so the flexibility exists to update some + statistics more frequently than others if your application requires it. + In practice, however, it is usually best to just analyze the entire + database, because it is a fast operation. ANALYZE uses a + statistically random sampling of the rows of a table rather than reading + every single row. + + + + + Although per-column tweaking of ANALYZE frequency might not be + very productive, you might find it worthwhile to do per-column + adjustment of the level of detail of the statistics collected by + ANALYZE. Columns that are heavily used in WHERE + clauses and have highly irregular data distributions might require a + finer-grain data histogram than other columns. See ALTER TABLE + SET STATISTICS, or change the database-wide default using the configuration parameter. + + + + Also, by default there is limited information available about + the selectivity of functions. However, if you create a statistics + object or an expression + index that uses a function call, useful statistics will be + gathered about the function, which can greatly improve query + plans that use the expression index. + + + + + + The autovacuum daemon does not issue ANALYZE commands for + foreign tables, since it has no means of determining how often that + might be useful. If your queries require statistics on foreign tables + for proper planning, it's a good idea to run manually-managed + ANALYZE commands on those tables on a suitable schedule. + + + + + + The autovacuum daemon does not issue ANALYZE commands + for partitioned tables. Inheritance parents will only be analyzed if the + parent itself is changed - changes to child tables do not trigger + autoanalyze on the parent table. If your queries require statistics on + parent tables for proper planning, it is necessary to periodically run + a manual ANALYZE on those tables to keep the statistics + up to date. + + + + -- 2.40.1