Thread: vacuumdb --analyze-only (e.g., after a major upgrade) vs. partitioned tables: pg_statistic missing stats for the partitioned table itself
vacuumdb --analyze-only (e.g., after a major upgrade) vs. partitioned tables: pg_statistic missing stats for the partitioned table itself
From
Nikolay Samokhvalov
Date:
| 2:47 PM (0 minutes ago) | ![]() ![]() | ||
|
I just learned that vacuumdb --analyze-only doesn't update stats for the partitioned table itself, taking care only about individual partitions:
(DDL doesn't matter here)
# vacuumdb --analyze-only -U postgres test --verbose
...
INFO: analyzing "public.measurement_2023_01"
INFO: "measurement_2023_01": scanned 6370 of 6370 pages, containing 1000000 live rows and 0 dead rows; 30000 rows in sample, 1000000 estimated total rows
INFO: "measurement_2023_01": scanned 6370 of 6370 pages, containing 1000000 live rows and 0 dead rows; 30000 rows in sample, 1000000 estimated total rows
INFO: "measurement_2023_02": scanned 6257 of 6257 pages, containing 982279 live rows and 0 dead rows; 30000 rows in sample, 982279 estimated total rows
INFO: analyzing "public.measurement_2023_03"
INFO: "measurement_2023_03": scanned 6483 of 6483 pages, containing 1017721 live rows and 0 dead rows; 30000 rows in sample, 1017721 estimated total rows
...
test=# select starelid::regclass, count(*) from pg_statistic where starelid::regclass::text ~ 'measurement' group by 1 order by 1;
starelid | count
---------------------+-------
measurement_2023_01 | 4
measurement_2023_02 | 4
measurement_2023_03 | 4
(3 rows)
While for the single-threaded SQL-level ANALYZE:
test=# analyze verbose measurement;
...
test=# select starelid::regclass, count(*) from pg_statistic where starelid::regclass::text ~ 'measurement' group by 1 order by 1;
starelid | count
---------------------+-------
measurement | 4
measurement_2023_01 | 4
measurement_2023_02 | 4
measurement_2023_03 | 4
(4 rows)
This means that if, after running pg_upgrade, we use vacuumdb to update stats faster, some stats may be missing, potentially leading to suboptimal performance.
INFO: analyzing "public.measurement_2023_03"
INFO: "measurement_2023_03": scanned 6483 of 6483 pages, containing 1017721 live rows and 0 dead rows; 30000 rows in sample, 1017721 estimated total rows
...
test=# select starelid::regclass, count(*) from pg_statistic where starelid::regclass::text ~ 'measurement' group by 1 order by 1;
starelid | count
---------------------+-------
measurement_2023_01 | 4
measurement_2023_02 | 4
measurement_2023_03 | 4
(3 rows)
While for the single-threaded SQL-level ANALYZE:
test=# analyze verbose measurement;
...
test=# select starelid::regclass, count(*) from pg_statistic where starelid::regclass::text ~ 'measurement' group by 1 order by 1;
starelid | count
---------------------+-------
measurement | 4
measurement_2023_01 | 4
measurement_2023_02 | 4
measurement_2023_03 | 4
(4 rows)
This means that if, after running pg_upgrade, we use vacuumdb to update stats faster, some stats may be missing, potentially leading to suboptimal performance.
Additionally, it doesn't help that pg_stat_all_tables doesn't show counters/timestamps for partitioned table, even after SQL-level ANALYZE:
test=# select relname, analyze_count, autoanalyze_count, last_analyze, last_autoanalyze from pg_stat_user_tables where relname ~ 'measurement';
relname | analyze_count | autoanalyze_count | last_analyze | last_autoanalyze
---------------------+---------------+-------------------+-------------------------------+------------------
measurement_2023_01 | 2 | 0 | 2024-10-24 21:25:47.979958+00 |
measurement_2023_02 | 2 | 0 | 2024-10-24 21:25:48.070355+00 |
measurement_2023_03 | 2 | 0 | 2024-10-24 21:25:48.154613+00 |
(3 rows)
relname | analyze_count | autoanalyze_count | last_analyze | last_autoanalyze
---------------------+---------------+-------------------+-------------------------------+------------------
measurement_2023_01 | 2 | 0 | 2024-10-24 21:25:47.979958+00 |
measurement_2023_02 | 2 | 0 | 2024-10-24 21:25:48.070355+00 |
measurement_2023_03 | 2 | 0 | 2024-10-24 21:25:48.154613+00 |
(3 rows)
This is also discussed in https://www.postgresql.org/message-id/flat/CADofcAXVbD0yGp_EaC9chmzsOoSai3jcfBCnyva3j0RRdRvMVA%40mail.gmail.com
I propose considering 3 fixes:
1) vacuumdb --analyze / --analyze-only to update stats for the partitioned table, so people using pg_upgrade are not in trouble
1) vacuumdb --analyze / --analyze-only to update stats for the partitioned table, so people using pg_upgrade are not in trouble
2) present the ANALYZE metadata for partitioned tables in pg_stat_all_tables
3) for old versions, either backpatch with fix (1) OR just add to the docs (and maybe to the final words pg_upgrade prints), suggesting something like this in addition to vacuumdb analyze-only:
-- psql snippet
select format(
-- psql snippet
select format(
'analyze verbose %I.%I;',
relnamespace::oid::regnamespace,
oid::regclass
) as vacuum_command
from pg_class
where relkind = 'p' \gexec
from pg_class
where relkind = 'p' \gexec
Additionally, I do like the idea of ANALYZE ONLY from the -general discussion above (though, there might be confusion with logic of --analyze and --analyze-only in vacuumdb).
Does it make sense?
Nik
Re: vacuumdb --analyze-only (e.g., after a major upgrade) vs. partitioned tables: pg_statistic missing stats for the partitioned table itself
From
Bruce Momjian
Date:
On Thu, Oct 24, 2024 at 02:48:42PM -0700, Nikolay Samokhvalov wrote: > [ACg8ocIyQq] > Nikolay Samokhvalov <nik@postgres.ai> 2:47 PM (0 minutes ago) > [cleardot] > to pglsql-hackers [cleardot] > [cleardot] > I just learned that vacuumdb --analyze-only doesn't update stats for the > partitioned table itself, taking care only about individual partitions: Yes, this is covered in the ANALYZE manual page: https://www.postgresql.org/docs/current/sql-analyze.html For partitioned tables, ANALYZE gathers statistics by sampling rows from all partitions; in addition, it will recurse into each partition and update its statistics. Each leaf partition is analyzed only once, even with multi-level partitioning. No statistics are collected for only the parent table (without data from its partitions), because with partitioning it's guaranteed to be empty. It is discussed here: https://www.postgresql.org/message-id/flat/CAB%2B%3D1TULKjDSBHxqSQVQstxcHshGzQUnHfp45GSESAu2qm0VKg%40mail.gmail.com#586bc5deef05c35ac16100dee99f6e9e > I propose considering 3 fixes: > > 1) vacuumdb --analyze / --analyze-only to update stats for the partitioned > table, so people using pg_upgrade are not in trouble > 2) present the ANALYZE metadata for partitioned tables in pg_stat_all_tables > 3) for old versions, either backpatch with fix (1) OR just add to the docs (and > maybe to the final words pg_upgrade prints), suggesting something like this in > addition to vacuumdb analyze-only: > > -- psql snippet > select format( > 'analyze verbose %I.%I;', > relnamespace::oid::regnamespace, > oid::regclass > ) as vacuum_command > from pg_class > where relkind = 'p' \gexec > > Additionally, I do like the idea of ANALYZE ONLY from the -general discussion > above (though, there might be confusion with logic of --analyze and > --analyze-only in vacuumdb). > > Does it make sense? I certainly would like to see this improved. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"