Re: Autovacuum on partitioned table (autoanalyze) - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: Autovacuum on partitioned table (autoanalyze) |
Date | |
Msg-id | 20200418050815.GE26953@telsasoft.com Whole thread Raw |
In response to | Re: Autovacuum on partitioned table (autoanalyze) (Amit Langote <amitlangote09@gmail.com>) |
Responses |
Re: Autovacuum on partitioned table (autoanalyze)
|
List | pgsql-hackers |
On Fri, Apr 17, 2020 at 10:09:07PM +0900, Amit Langote wrote: > On Thu, Apr 16, 2020 at 11:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote: > > I don't think that adequately allows what's needed. ...(paragraph with my typos elided)... > > For example, say a new customer has bunch of partitioned tables which each > > currently have only one partition (for the current month), and that's expected > > to grow to at least 20+ partitions (2+ years of history). How does one set the > > partitioned table's auto-analyze parameters to analyze whenever any child is > > analyzed ? I don't think it should be needed to update it every month after > > computing sum(child tuples). > > > > Possibly you could allow that behavior for some special values of the > > threshold. Like if autovacuum_analyze_threshold=-2, then analyze the parent > > whenever any of its children are analyzed. > > > > I think that use case and that need would be common, but I'd like to hear what > > others think. > > Having to constantly pay attention to whether a parent's > analyze_threshold/scale_factor is working as intended would surely be > an annoyance, so I tend to agree that we might need more than just the > ability to set analyze_threshold/scale_factor on parent tables. > However, I think we can at least start with being able to do > *something* here. :) Maybe others think that this shouldn't be > considered committable until we figure out a good analyze threshold > calculation formula to apply to parent tables. > > Considering that, how about having, say, a > autovacuum_analyze_partition_parent_frequency, with string values > 'default', 'partition'? -- 'default' assumes the same formula as > regular tables, whereas with 'partition', parent is analyzed as soon > as a partition is. I assume you mean a reloption to be applied only to partitioned tables, Your "partition" setting would mean that the scale/threshold values would have no effect, which seems kind of unfortunate. I think it should be called something else, and done differently, like maybe: autovacuum_analyze_mode = {off,sum,max,...} The threshold would be threshold + scale*tuples, as always, but would be compared with f(changes) as determined by the relopt. sum(changes) would do what you called "default", comparing the sum(changes) across all partitions to the threshold, which is itself computed using sum(reltuples) AS reltuples. max(changes) would compute max(changes) compared to the threshold, and the threshold would be computed separately for each partition's reltuples: threshold_N = parent_threshold + parent_scale * part_N_tuples. If *any* partition exceeds that threshold, the partition itself is analyzed. This allows what I want for time-series. Maybe this would have an alias called "any". I'm not sure if there's any other useful modes, like avg(changes)? I guess we can add them later if someone thinks of a good use case. Also, for me, the v7 patch warns: |src/backend/postmaster/autovacuum.c:3117:70: warning: ‘reltuples’ may be used uninitialized in this function [-Wmaybe-uninitialized] | vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples; ..which seems to be a false positive, but easily avoided. This patch includes partitioned tables in pg_stat_*_tables, which is great; I complained awhile ago that they were missing [0]. It might be useful if that part was split out into a separate 0001 patch (?). Thanks, -- Justin [0] https://www.postgresql.org/message-id/20180601221428.GU5164%40telsasoft.com
pgsql-hackers by date: