Thread: Autovacuum on Partitioned Tables
Hello there!
There is a statement in the Postgres Docs: "Partitioned tables are not processed by autovacuum."
What does the above statement mean?
Does autovacuum not process both the parent and the child tables in a partition relationship?
What is the definition of a partitioned table?
I have a table with 50+ million rows that gets data added to/wiped from it every 90 days. We are planning to break this table into a few thousand partitions. More partitions will likely be added in the future, but we don't have plans to delete any of the partitions. Will we need to manually track the statistics of these partitions and manually vacuum the tables or will autovacuum help to manage them?
Best,
Ryan
Ryan
Ryan N Ruenroeng (He/His) | |||||||||
| |||||||||
|
On 10/31/22 17:26, Ryan Ruenroeng wrote:
Partitioned (aka parent) tables are "virtual". There's nothing to vacuum are analyze.
1. That's a lot of partitions. Older (like v12) query optimizers don't do a great job handle that many partitions.
2. Be careful what you partition on. (We added "partition_date" to PKs so as to partition by date, even though the "real" PK is a synthetic key; it was a query performance disaster in Pg 12.)
Autovacuum will handle it.
Hello there!There is a statement in the Postgres Docs: "Partitioned tables are not processed by autovacuum."What does the above statement mean?Does autovacuum not process both the parent and the child tables in a partition relationship?
Partitioned (aka parent) tables are "virtual". There's nothing to vacuum are analyze.
What is the definition of a partitioned table?I have a table with 50+ million rows that gets data added to/wiped from it every 90 days. We are planning to break this table into a few thousand partitions.
1. That's a lot of partitions. Older (like v12) query optimizers don't do a great job handle that many partitions.
2. Be careful what you partition on. (We added "partition_date" to PKs so as to partition by date, even though the "real" PK is a synthetic key; it was a query performance disaster in Pg 12.)
More partitions will likely be added in the future, but we don't have plans to delete any of the partitions. Will we need to manually track the statistics of these partitions and manually vacuum the tables or will autovacuum help to manage them?
Autovacuum will handle it.
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
Ron <ronljohnsonjr@gmail.com> writes: > On 10/31/22 17:26, Ryan Ruenroeng wrote: >> I have a table with 50+ million rows that gets data added to/wiped from it >> every 90 days. We are planning to break this table into a few thousand >> partitions. > 1. That's a lot of partitions. Older (like v12) query optimizers don't do a > great job handle that many partitions. Newer ones don't either, if you are incautious enough to issue a query that touches all/most of the partitions --- or even just that the planner cannot prove doesn't touch all/most of the partitions. So that sort of setup hinges critically on having very stylized queries that you've vetted in advance. We'll probably continue to make incremental improvements in how well things work with lots of partitions, but I don't foresee it ever becoming a cost-free thing. TBH, if you've got 50m rows, I'm not sure you need partitions at all. You sure as heck do not need "a few thousand" of them. You should be basing your partitioning design on predictable maintenance operations that will match your partition boundaries. For example, if your business requirement is to keep five years' worth of records and you want to drop the oldest month's records in bulk once a month, then partitioning by month would be pretty helpful to make those drops cheap. That would lead to 60 active partitions which is entirely reasonable. regards, tom lane
On Mon, 2022-10-31 at 17:26 -0500, Ryan Ruenroeng wrote: > There is a statement in the Postgres Docs: "Partitioned tables are not processed by autovacuum." > > What does the above statement mean? > Does autovacuum not process both the parent and the child tables in a partition relationship? I agree that this is unclear and have proposed an improvement: https://postgr.es/m/1fd81ddc7710a154834030133c6fea41e55c8efb.camel@cybertec.at Yours, Laurenz Albe
On 10/31/22 23:05, Tom Lane wrote: [snip] > TBH, if you've got 50m rows, I'm not sure you need partitions at all. Big rows (i.e. document storage tables with bytea or xml fields) can make databases explode in size even with only 50M rows. (Yes, I know the arguments against it, but it works quite well when the database is in a cloud instance. Worries about backup times, at least, are eliminated.) -- Angular momentum makes the world go 'round.
Thanks all for your responses. We have a couple of tables. Each with 50-70mil rows currently, but they are expected to grow. Partitioning seems to be a better long-term strategy, queries to these tables, using their existing indexes, leaves them basically unusable (loooong run times).
FYI, we are currently on PG v11.16
Let's assume we do go with my current proposal because it is inline with how we are planning to drop partitions that stop seeing activity in the future:
[snip] from Ron's post above:
Autovacuum will handle it.
I still have some doubts based on this:
"Tuples changed in partitions and inheritance children do not trigger analyze on the parent table. If the parent table is empty or rarely changed, it may never be processed by autovacuum, and the statistics for the inheritance tree as a whole won't be collected. It is necessary to run ANALYZE
on the parent table manually in order to keep the statistics up to date." [Link]
Q1: Will we at least need to call Analyze via a cron job on the parent table to ensure that the statistics are up to date for autovacuum to catch the tables?
From reading the documentation that a few of you have pointed me to, I'm led to believe that the parent table is the "Partition" table. The children tables are treated by the autovacuum as tables
Q2: Autovacuum will act on the partitions/children to the parent table. Is that a correct statement?
--
It's good to know that the query optimizer will improve with partitions on versions 12+. Thank you.
Best,
Ryan
Ryan N Ruenroeng (He/His) | |||||||||
| |||||||||
|
On Tue, Nov 1, 2022 at 2:54 AM Ron <ronljohnsonjr@gmail.com> wrote:
On 10/31/22 23:05, Tom Lane wrote:
[snip]
> TBH, if you've got 50m rows, I'm not sure you need partitions at all.
Big rows (i.e. document storage tables with bytea or xml fields) can make
databases explode in size even with only 50M rows.
(Yes, I know the arguments against it, but it works quite well when the
database is in a cloud instance. Worries about backup times, at least, are
eliminated.)
--
Angular momentum makes the world go 'round.
On 11/1/22 10:11, Ryan Ruenroeng wrote:T
Tuning auto-analyze variables is supposed to obviate the need for manualvacuuming analyzing. I do manual vacuuming anayzing anyway in a cron job. "This set of tables have had more than X amount of changes, so I'll explictly analyze them."
Once a week, I do bloat calculations and vacuum based on that.
"\d" will show you exactly what you need to know.
Child tables are tables.
Parents are meta-tables. There's nothing to vacuum or analyze.
Thanks all for your responses. We have a couple of tables. Each with 50-70mil rows currently, but they are expected to grow. Partitioning seems to be a better long-term strategy, queries to these tables, using their existing indexes, leaves them basically unusable (loooong run times).FYI, we are currently on PG v11.16Let's assume we do go with my current proposal because it is inline with how we are planning to drop partitions that stop seeing activity in the future:[snip] from Ron's post above:Autovacuum will handle it.I still have some doubts based on this:"Tuples changed in partitions and inheritance children do not trigger analyze on the parent table. If the parent table is empty or rarely changed, it may never be processed by autovacuum, and the statistics for the inheritance tree as a whole won't be collected. It is necessary to runANALYZE
on the parent table manually in order to keep the statistics up to date." [Link]Q1: Will we at least need to call Analyze via a cron job on the parent table to ensure that the statistics are up to date for autovacuum to catch the tables?
Tuning auto-analyze variables is supposed to obviate the need for manual
Once a week, I do bloat calculations and vacuum based on that.
From reading the documentation that a few of you have pointed me to, I'm led to believe that the parent table is the "Partition" table.
"\d" will show you exactly what you need to know.
The children tables are treated by the autovacuum as tables
Child tables are tables.
Q2: Autovacuum will act on the partitions/children to the parent table. Is that a correct statement?
Parents are meta-tables. There's nothing to vacuum or analyze.
--It's good to know that the query optimizer will improve with partitions on versions 12+. Thank you.Best,Ryan
Ryan N Ruenroeng (He/His)
(717) 578-3929
| RRuenroeng@gmail.com
| Madison, WI
On Tue, Nov 1, 2022 at 2:54 AM Ron <ronljohnsonjr@gmail.com> wrote:On 10/31/22 23:05, Tom Lane wrote:
[snip]
> TBH, if you've got 50m rows, I'm not sure you need partitions at all.
Big rows (i.e. document storage tables with bytea or xml fields) can make
databases explode in size even with only 50M rows.
(Yes, I know the arguments against it, but it works quite well when the
database is in a cloud instance. Worries about backup times, at least, are
eliminated.)
--
Angular momentum makes the world go 'round.
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.