Re: Should vacuum process config file reload more often - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Should vacuum process config file reload more often |
Date | |
Msg-id | CAAKRu_YmQg_i=0DBfPzy1MMc2GjfuJFo=0rFv4fCzRQStLhjvw@mail.gmail.com Whole thread Raw |
In response to | Re: Should vacuum process config file reload more often (Daniel Gustafsson <daniel@yesql.se>) |
Responses |
Re: Should vacuum process config file reload more often
|
List | pgsql-hackers |
v17 attached does not yet fix the logging problem or variable naming problem. I have not changed where AutoVacuumUpdateCostLimit() is called either. This is effectively just a round of cleanup. I hope I have managed to address all other code review feedback so far, though some may have slipped through the cracks. On Wed, Apr 5, 2023 at 2:56 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Apr 5, 2023 at 11:29 AM Melanie Plageman <melanieplageman@gmail.com> wrote: > + /* > + * Balance and update limit values for autovacuum workers. We must > + * always do this in case the autovacuum launcher or another > + * autovacuum worker has recalculated the number of workers across > + * which we must balance the limit. This is done by the launcher when > + * launching a new worker and by workers before vacuuming each table. > + */ > > I don't quite understand what's going on here. A big reason that I'm > worried about this whole issue in the first place is that sometimes > there's a vacuum going on a giant table and you can't get it to go > fast. You want it to absorb new settings, and to do so quickly. I > realize that this is about the number of workers, not the actual cost > limit, so that makes what I'm about to say less important. But ... is > this often enough? Like, the time before we move onto the next table > could be super long. The time before a new worker is launched should > be ~autovacuum_naptime/autovacuum_max_workers or ~20s with default > settings, so that's not horrible, but I'm kind of struggling to > understand the rationale for this particular choice. Maybe it's fine. I've at least updated this comment to be more correct/less misleading. > > + if (autovacuum_vac_cost_limit > 0) > + VacuumCostLimit = autovacuum_vac_cost_limit; > + else > + VacuumCostLimit = vacuum_cost_limit; > + > + /* Only balance limit if no cost-related storage > parameters specified */ > + if (pg_atomic_unlocked_test_flag(&MyWorkerInfo->wi_dobalance)) > + return; > + Assert(VacuumCostLimit > 0); > + > + nworkers_for_balance = pg_atomic_read_u32( > + > &AutoVacuumShmem->av_nworkersForBalance); > + > + /* There is at least 1 autovac worker (this worker). */ > + if (nworkers_for_balance <= 0) > + elog(ERROR, "nworkers_for_balance must be > 0"); > + > + VacuumCostLimit = Max(VacuumCostLimit / > nworkers_for_balance, 1); > > I think it would be better stylistically to use a temporary variable > here and only assign the final value to VacuumCostLimit. I tried that and thought it adding confusing clutter. If it is a code cleanliness issue, I am willing to change it, though. On Wed, Apr 5, 2023 at 3:04 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 5 Apr 2023, at 20:55, Robert Haas <robertmhaas@gmail.com> wrote: > > > Again, I don't think this is something we should try to > > address right now under time pressure, but in the future, I think we > > should consider ripping this behavior out. > > I would not be opposed to that, but I wholeheartedly agree that it's not the > job of this patch (or any patch at this point in the cycle). > > > + if (autovacuum_vac_cost_limit > 0) > > + VacuumCostLimit = autovacuum_vac_cost_limit; > > + else > > + VacuumCostLimit = vacuum_cost_limit; > > + > > + /* Only balance limit if no cost-related storage > > parameters specified */ > > + if (pg_atomic_unlocked_test_flag(&MyWorkerInfo->wi_dobalance)) > > + return; > > + Assert(VacuumCostLimit > 0); > > + > > + nworkers_for_balance = pg_atomic_read_u32( > > + > > &AutoVacuumShmem->av_nworkersForBalance); > > + > > + /* There is at least 1 autovac worker (this worker). */ > > + if (nworkers_for_balance <= 0) > > + elog(ERROR, "nworkers_for_balance must be > 0"); > > + > > + VacuumCostLimit = Max(VacuumCostLimit / > > nworkers_for_balance, 1); > > > > I think it would be better stylistically to use a temporary variable > > here and only assign the final value to VacuumCostLimit. > > I can agree with that. Another supertiny nitpick on the above is to not end a > single-line comment with a period. I have fixed this. On Thu, Apr 6, 2023 at 2:40 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Apr 6, 2023 at 12:29 AM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > > > Thanks all for the reviews. > > > > v16 attached. I put it together rather quickly, so there might be a few > > spurious whitespaces or similar. There is one rather annoying pgindent > > outlier that I have to figure out what to do about as well. > > > > The remaining functional TODOs that I know of are: > > > > - Resolve what to do about names of GUC and vacuum variables for cost > > limit and cost delay (since it may affect extensions) > > > > - Figure out what to do about the logging message which accesses dboid > > and tableoid (lock/no lock, where to put it, etc) > > > > - I see several places in docs which reference the balancing algorithm > > for autovac workers. I did not read them in great detail, but we may > > want to review them to see if any require updates. > > > > - Consider whether or not the initial two commits should just be > > squashed with the third commit > > > > - Anything else reviewers are still unhappy with > > > > > > On Wed, Apr 5, 2023 at 1:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Wed, Apr 5, 2023 at 5:05 AM Melanie Plageman > > > <melanieplageman@gmail.com> wrote: > > > > > > > > On Tue, Apr 4, 2023 at 4:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > --- > > > > > - if (worker->wi_proc != NULL) > > > > > - elog(DEBUG2, "autovac_balance_cost(pid=%d > > > > > db=%u, rel=%u, dobalance=%s cost_limit=%d, cost_limit_base=%d, > > > > > cost_delay=%g)", > > > > > - worker->wi_proc->pid, > > > > > worker->wi_dboid, worker->wi_tableoid, > > > > > - worker->wi_dobalance ? "yes" : "no", > > > > > - worker->wi_cost_limit, > > > > > worker->wi_cost_limit_base, > > > > > - worker->wi_cost_delay); > > > > > > > > > > I think it's better to keep this kind of log in some form for > > > > > debugging. For example, we can show these values of autovacuum workers > > > > > in VacuumUpdateCosts(). > > > > > > > > I added a message to do_autovacuum() after calling VacuumUpdateCosts() > > > > in the loop vacuuming each table. That means it will happen once per > > > > table. It's not ideal that I had to move the call to VacuumUpdateCosts() > > > > behind the shared lock in that loop so that we could access the pid and > > > > such in the logging message after updating the cost and delay, but it is > > > > probably okay. Though noone is going to be changing those at this > > > > point, it still seemed better to access them under the lock. > > > > > > > > This does mean we won't log anything when we do change the values of > > > > VacuumCostDelay and VacuumCostLimit while vacuuming a table. Is it worth > > > > adding some code to do that in VacuumUpdateCosts() (only when the value > > > > has changed not on every call to VacuumUpdateCosts())? Or perhaps we > > > > could add it in the config reload branch that is already in > > > > vacuum_delay_point()? > > > > > > Previously, we used to show the pid in the log since a worker/launcher > > > set other workers' delay costs. But now that the worker sets its delay > > > costs, we don't need to show the pid in the log. Also, I think it's > > > useful for debugging and investigating the system if we log it when > > > changing the values. The log I imagined to add was like: > > > > > > @@ -1801,6 +1801,13 @@ VacuumUpdateCosts(void) > > > VacuumCostDelay = vacuum_cost_delay; > > > > > > AutoVacuumUpdateLimit(); > > > + > > > + elog(DEBUG2, "autovacuum update costs (db=%u, rel=%u, > > > dobalance=%s, cost_limit=%d, cost_delay=%g active=%s failsafe=%s)", > > > + MyWorkerInfo->wi_dboid, MyWorkerInfo->wi_tableoid, > > > + pg_atomic_unlocked_test_flag(&MyWorkerInfo->wi_dobalance) > > > ? "no" : "yes", > > > + VacuumCostLimit, VacuumCostDelay, > > > + VacuumCostDelay > 0 ? "yes" : "no", > > > + VacuumFailsafeActive ? "yes" : "no"); > > > } > > > else > > > { > > > > Makes sense. I've updated the log message to roughly what you suggested. > > I also realized I think it does make sense to call it in > > VacuumUpdateCosts() -- only for autovacuum workers of course. I've done > > this. I haven't taken the lock though and can't decide if I must since > > they access dboid and tableoid -- those are not going to change at this > > point, but I still don't know if I can access them lock-free... > > Perhaps there is a way to condition it on the log level? > > > > If I have to take a lock, then I don't know if we should put these in > > VacuumUpdateCosts()... > > I think we don't need to acquire a lock there as both values are > updated only by workers reporting this message. I dunno. I just don't feel that comfortable saying, oh it's okay to access these without a lock probably. I propose we do one of the following: - Take a shared lock inside VacuumUpdateCosts() (it is not called on every call to vacuum_delay_point()) before reading from these variables. Pros: - We will log whenever there is a change to these parameters Cons: - This adds overhead in the common case when log level is < DEBUG2. Is there a way to check the log level before taking the lock? - Acquiring the lock inside the function is inconsistent with the pattern that some of the other autovacuum functions requiring locks use (they assume you have a lock if needed inside of the function). But, we could assert that the lock is not already held. - If we later decide we don't like this choice and want to move the logging elsewhere, it will necessarily log less frequently which seems like a harder change to make than logging more frequently. - Move this logging into the loop through relations in do_autovacuum() and the config reload code and take the shared lock before doing the logging. Pros: - Seems safe and not expensive - Covers most of the times we would want the logging Cons: - duplicates logging in two places > Some minor comments on 0003: > > +/* > + * autovac_recalculate_workers_for_balance > + * Recalculate the number of workers to consider, given > cost-related > + * storage parameters and the current number of active workers. > + * > + * Caller must hold the AutovacuumLock in at least shared mode to access > + * worker->wi_proc. > + */ > > Does it make sense to add Assert(LWLockHeldByMe(AutovacuumLock)) at > the beginning of this function? I've added this. It is called infrequently enough to be okay, I think. > /* rebalance in case the default cost parameters changed */ > - LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE); > - autovac_balance_cost(); > + LWLockAcquire(AutovacuumLock, LW_SHARED); > + autovac_recalculate_workers_for_balance(); > LWLockRelease(AutovacuumLock); > > Do we really need to have the autovacuum launcher recalculate > av_nworkersForBalance after reloading the config file? Since the cost > parameters are not used inautovac_recalculate_workers_for_balance() > the comment also needs to be updated. Yep, almost certainly don't need this. I've removed this call to autovac_recalculate_workers_for_balance(). > + /* > + * Balance and update limit values for autovacuum > workers. We must > + * always do this in case the autovacuum launcher or another > + * autovacuum worker has recalculated the number of > workers across > + * which we must balance the limit. This is done by > the launcher when > + * launching a new worker and by workers before > vacuuming each table. > + */ > + AutoVacuumUpdateCostLimit(); > > I think the last sentence is not correct. IIUC recalculation of > av_nworkersForBalance is done by the launcher after a worker finished > and by workers before vacuuming each table. Yes, you are right. However, I think the comment was generally misleading and I have reworded it. > It's not a problem of this patch, but IIUC since we don't reset > wi_dobalance after vacuuming each table we use the last value of > wi_dobalance for performing autovacuum items. At end of the loop for > tables in do_autovacuum() we have the following code that explains why > we don't reset wi_dobalance: > > /* > * Remove my info from shared memory. We could, but intentionally > * don't, unset wi_dobalance on the assumption that we are more likely > * than not to vacuum a table with no cost-related storage parameters > * next, so we don't want to give up our share of I/O for a very short > * interval and thereby thrash the global balance. > */ > LWLockAcquire(AutovacuumScheduleLock, LW_EXCLUSIVE); > MyWorkerInfo->wi_tableoid = InvalidOid; > MyWorkerInfo->wi_sharedrel = false; > LWLockRelease(AutovacuumScheduleLock); > > Assuming we agree with that, probably we need to reset it to true > after vacuuming all tables? Ah, great point. I have done this. On Thu, Apr 6, 2023 at 8:29 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 6 Apr 2023, at 08:39, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > Also I agree with > > where to put the log but I think the log message should start with > > lower cases: > > > > + elog(DEBUG2, > > + "Autovacuum VacuumUpdateCosts(db=%u, rel=%u, > > In principle I agree, but in this case Autovacuum is a name and should IMO in > userfacing messages start with capital A. I've left this unchanged while I agonize over what to do with the placement of the log message in general. But I am happy to keep it uppercase. > > +/* > > + * autovac_recalculate_workers_for_balance > > + * Recalculate the number of workers to consider, given > > cost-related > > + * storage parameters and the current number of active workers. > > + * > > + * Caller must hold the AutovacuumLock in at least shared mode to access > > + * worker->wi_proc. > > + */ > > > > Does it make sense to add Assert(LWLockHeldByMe(AutovacuumLock)) at > > the beginning of this function? > > That's probably not a bad idea. Done. > > --- > > /* rebalance in case the default cost parameters changed */ > > - LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE); > > - autovac_balance_cost(); > > + LWLockAcquire(AutovacuumLock, LW_SHARED); > > + autovac_recalculate_workers_for_balance(); > > LWLockRelease(AutovacuumLock); > > > > Do we really need to have the autovacuum launcher recalculate > > av_nworkersForBalance after reloading the config file? Since the cost > > parameters are not used inautovac_recalculate_workers_for_balance() > > the comment also needs to be updated. > > If I understand this comment right; there was a discussion upthread that simply > doing it in both launcher and worker simplifies the code with little overhead. > A comment can reflect that choice though. Yes, but now that this function no longer deals with the cost limit and delay values itself, we can remove it. - Melanie
Attachment
pgsql-hackers by date: