RFC: changing autovacuum_naptime semantics - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | RFC: changing autovacuum_naptime semantics |
Date | |
Msg-id | 20070307230001.GC7122@alvh.no-ip.org Whole thread Raw |
Responses |
Re: RFC: changing autovacuum_naptime semantics
Re: RFC: changing autovacuum_naptime semantics Re: RFC: changing autovacuum_naptime semantics |
List | pgsql-hackers |
Hackers, I want to propose some very simple changes to autovacuum in order to move forward (a bit): 1. autovacuum_naptime semantics 2. limiting the number of workers: global, per database, per tablespace? I still haven't received the magic bullet to solve the hot table problem, but these at least means we continue doing *something*. Changing autovacuum_naptime semantics Are we agreed on changing autovacuum_naptime semantics? The idea is to make it per-database instead of the current per-cluster, i.e., a "nap" would be the minimum time that passes between starting one worker into a database and starting another worker in the same database. Currently, naptime is the time elapsed between two worker runs across all databases. So if you have 15 databases, autovacuuming each one takes place every 15*naptime. Eventually, we could have per-database naptime defined in pg_database, and do away with the autovacuum_naptime GUC param (or maybe keep it as a default value). Say for database D1 you want to have workers every 60 seconds but for database D2 you want 1 hour. Question: Is everybody OK with changing the autovacuum_naptime semantics? Limiting the number of workers I was originally proposing having a GUC parameter which would limit the cluster-wide maximum number of workers. Additionally we could have a per-database limit (stored in a pg_database column), being simple to implement. Josh Drake proposed getting rid of the GUC param, saying that it would confuse users to set the per-database limit to some higher value than the GUC setting and then finding the lower limit enforced (presumably because of being unaware of it). The problem is that we need to set shared memory up for workers, so we really need a hard limit and it must be global. Thus the GUC param is not optional. Other people also proposed having a per-tablespace limit. This would make a lot of sense, tablespaces being the natural I/O units. However, I'm not very sure it's too easy to implement, because you can put half of database D1 and half of database D2 in tablespace T1, and the two other halves in tablespace T2. Then enforcing the limit becomes rather complicated and will probably mean putting a worker to sleep. I think it makes more sense to skip implementing per-tablespace limits for now, and have a plan to put per-tablespace IO throttles in the future. Questions: Is everybody OK with not putting a per-tablespace worker limit? Is everybody OK with putting per-database worker limits on a pg_database column? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
pgsql-hackers by date: