Re: Autovacuum integration - Mailing list pgsql-patches
From | Tom Lane |
---|---|
Subject | Re: Autovacuum integration |
Date | |
Msg-id | 27850.1120852585@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Autovacuum integration (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Responses |
Re: Autovacuum integration
Re: Autovacuum integration |
List | pgsql-patches |
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Here is a second attempt at autovacuum integration. A few comments: * I strongly disagree with keeping updatable state in a catalog. In the first place, that will cause autovac itself to create vacuumable trash. In the second place, that means you can't modify the internal state kept by autovacuum without forcing initdb; which is not a good situation, considering how much change this code is likely to go through. I think all the updatable state should be kept in the pgstats file. (Memo to self: let's add a version number header to the pgstats file, so that we can change its format without requiring an initdb to do so.) pg_autovacuum should only contain user-settable parameters --- which is still putting us at nontrivial risk for initdb constraints, but it's way better than internal state. If you do that, then pg_autovacuum need only contain entries for tables that have been given non-default parameter settings (ie, those for which the user wants to override global settings given in postgresql.conf). * I'm pretty dubious about adding a syscache for pg_autovacuum. Where exactly are the repeated fetches going to come from to justify caching it? * The signal processing needs re-work; you can't just randomly assign signal numbers, you need to keep these things largely consistent with the other subprocesses. In particular, if this thing is going to be running transactions then it MUST honor SIGALRM (eg, for deadlock timeout checks) and SIGUSR1 (sinval catchup interrupt), and I don't think you get to ignore SIGTERM either (may get this from init). I think it's a pretty bad idea to use SIGUSR2 for something other than what regular backends use it for, too. (Consider the prospect that for some reason your PID occurs in pg_listener.) It'd be a good idea to honor SIGINT with the normal meaning (query cancel, ie, abort transaction, which in this context would also imply process shutdown) and use that to shut down the daemon at postmaster stop. In short, the signal handling had better look a whole lot more like a normal backend's. * Speaking of shutdown, you can't stop the bgwriter until you know that the autovac daemon is dead. In this respect too, it's much more like a backend than like any of the other support processes. Signal it when you signal the backends, and don't advance to the bgwriter kill phase until autovac and all the backends are known gone. * I see you have an autovac_init function to "annoy the user", but shouldn't this be checked every time we are about to spawn an autovac process? * I don't see any special checks for shared catalogs, which means they are probably going to be over-vacuumed; or possibly under-vacuumed if you fail to track the update stats for them in a single place rather than in each database. It'd probably be a good idea to nominate just one database to be the place from which shared catalogs are vacuumed; or treat them as a completely separate special case. * I have no objection to adding extra entry points to vacuum.c to simplify the calls to it. * With respect to comments like: + /* + * We just skip a table not found on the stat hash. This is + * because whatever we do here, there won't be good statistics + * next time around, so we would do this same action again. It is + * tempting to issue an ANALYZE, but it won't work because ANALYZE + * doesn't update the stat system. + */ ISTM the entire point of "autovac integration" is to make sure the rest of the system does what's needed to support autovac. If ANALYZE needs to send something to the stats system, make it do so. (ISTM it'd be reasonable for ANALYZE to make and report an estimate of the dead tuple count, which after all is what you are hoping all the rest of the stats machinery will derive for you...) regards, tom lane
pgsql-patches by date: