Re: Autovacuum on partitioned table (autoanalyze) - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: Autovacuum on partitioned table (autoanalyze) |
Date | |
Msg-id | 202108100010.vo4pr5ltu4qo@alvherre.pgsql Whole thread Raw |
In response to | Re: Autovacuum on partitioned table (autoanalyze) (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Autovacuum on partitioned table (autoanalyze)
|
List | pgsql-hackers |
Hello, On 2021-Jul-22, Andres Freund wrote: > 1) Somehow it seems like a violation to do stuff like > get_partition_ancestors() in pgstat.c. It's nothing I can't live with, but > it feels a bit off. Would likely not be too hard to address, e.g. by just > putting some of pgstat_report_anl_ancestors in partition.c instead. I understand the complain about this being a modularity violation -- the point being that pgstat.c has no business accessing system catalogs at all. Before this function, all pgstat_report_* functions were just assembling a message from counters accumulated somewhere and sending the bytes to the collector, and this new function is a deviation from that. It seems that we could improve this by having a function (maybe in partition.c as you propose), something like static void report_partition_ancestors(Oid relid) { ancestors = get_partition_ancestors( ... ); array = palloc(sizeof(Oid) * list_length(ancestors)); foreach(lc, ancestors) { array[i++] = lfirst_oid(lc); } pgstat_report_partition_ancestors(oid, array); } and then pgstat.c works with the given array without having to consult system catalogs. > 2) Why does it make sense that autovacuum sends a stats message for every > partition in the system that had any [changes] since the last autovacuum > cycle? On a database with a good number of objects / a short naptime we'll > often end up sending messages for the same set of tables from separate > workers, because they don't yet see the concurrent > tabentry->changes_since_analyze_reported. The traffic could be large, yeah, and I agree it seems undesirable. If collector kept a record of the list of ancestors of each table, then we wouldn't need to do this (we would have to know if collector knows a particular partition or not, though ... I have no ideas on that.) > 3) What is the goal of the autovac_refresh_stats() after the loop doing > pgstat_report_anl_ancestors()? I think it'll be common that the stats > collector hasn't even processed the incoming messages by that point, not to > speak of actually having written out a new stats file. If it took less than > 10ms (PGSTAT_RETRY_DELAY) to get to autovac_refresh_stats(), > backend_read_statsfile() will not wait for a new stats file to be written > out, and we'll just re-read the state we previously did. > > It's pretty expensive to re-read the stats file in some workloads, so I'm a > bit concerned that we end up significantly increasing the amount of stats > updates/reads, without actually gaining anything reliable? This is done once per autovacuum run and the point is precisely to let the next block absorb the updates that were sent. In manual ANALYZE we do it to inform future autovacuum runs. Note that the PGSTAT_RETRY_DELAY limit is used by the autovac launcher only, and this code is running in the worker; we do flush out the old data. Yes, it's expensive, but we're not doing it once per table, just once per worker run. > 4) In the shared mem stats patch I went to a fair bit of trouble to try to get > rid of pgstat_vacuum_stat() (which scales extremely poorly to larger > systems). For that to work pending stats can only be "staged" while holding > a lock on a relation that prevents the relation from being concurrently > dropped (pending stats increment a refcount for the shared stats object, > which ensures that we don't loose track of the fact that a stats object has > been dropped, even when stats only get submitted later). > > I'm not yet clear on how to make this work for > pgstat_report_anl_ancestors() - but I probably can find a way. But it does > feel a bit off to issue stats stuff for tables we're not sure still exist. I assume you refer to locking the *partition*, right? You're not talking about locking the ancestor mentioned in the message. I don't know how does the shmem-collector work, but it shouldn't be a problem that an ancestor goes away (ALTER TABLE parent DETACH; DROP TABLE parent); as long as you've kept a lock on the partition, it should be fine. Or am I misinterpreting what you mean? > I'll go and read through the thread, but my first thought is that having a > hashtable in do_autovacuum() that contains stats for partitioned tables would > be a good bit more efficient than the current approach? We already have a > hashtable for each toast table, compared to that having a hashtable for each > partitioned table doesn't seem like it'd be a problem? > With a small bit of extra work that could even avoid the need for the > additional pass through pg_class. Do the partitioned table data-gathering as > part of the "collect main tables to vacuum" pass, and then do one of I'll have to re-read the thread to remember why did I make it a separate pass. I think I did it that way because otherwise there was a requirement on the pg_class scan order. (Some earlier version of the patch did not have a separate pass and there was some problem or other. Maybe you're right that a hash table is sufficient.) -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "We're here to devour each other alive" (Hobbes)
pgsql-hackers by date: