Re: WAL insert delay settings - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: WAL insert delay settings |
Date | |
Msg-id | a1927321-b9c8-04b6-1c3b-6ccc9cd9e611@2ndquadrant.com Whole thread Raw |
In response to | Re: WAL insert delay settings (Andres Freund <andres@anarazel.de>) |
Responses |
Re: WAL insert delay settings
|
List | pgsql-hackers |
On 2/19/19 8:22 PM, Andres Freund wrote: > On 2019-02-19 20:02:32 +0100, Tomas Vondra wrote: >> Let's do a short example. Assume the default vacuum costing parameters >> >> vacuum_cost_limit = 200 >> vacuum_cost_delay = 20ms >> cost_page_dirty = 20 >> >> and for simplicity we only do writes. So vacuum can do ~8MB/s of writes. >> >> Now, let's also throttle based on WAL - once in a while, after producing >> some amount of WAL we sleep for a while. Again, for simplicity let's >> assume the sleeps perfectly interleave and are also 20ms. So we have >> something like: > >> sleep(20ms); -- vacuum >> sleep(20ms); -- WAL >> sleep(20ms); -- vacuum >> sleep(20ms); -- WAL >> sleep(20ms); -- vacuum >> sleep(20ms); -- WAL >> sleep(20ms); -- vacuum >> sleep(20ms); -- WAL >> >> Suddenly, we only reach 4MB/s of writes from vacuum. But we also reach >> only 1/2 the WAL throughput, because it's affected exactly the same way >> by the sleeps from vacuum throttling. >> >> We've not reached either of the limits. How exactly is this "lower limit >> takes effect"? > > Because I upthread said that that's not how I think a sane > implementation of WAL throttling would work. I think the whole cost > budgeting approach is BAD, and it'd be serious mistake to copy it for a > WAL rate limit (it disregards the time taken to execute IO and CPU costs > etc, and in this case the cost of other bandwidth limitations). What > I'm saying is that we ought to instead specify an WAL rate in bytes/sec > and *only* sleep once we've exceeded it for a time period (with some > optimizations, so we don't gettimeofday after every XLogInsert(), but > instead compute how many bytes later need to re-determine the time to > see if we're still in the same 'granule'). > OK, I agree with that. That's mostly what I described in response to Robert a while ago, I think. (If you've described that earlier in the thread, I missed it.) > Now, a non-toy implementation would probably would want to have a > sliding window to avoid being overly bursty, and reduce the number of > gettimeofday as mentioned above, but for explanation's sake basically > imagine that at the "main loop" of an bulk xlog emitting command would > invoke a helper with a a computation in pseudocode like: > > current_time = gettimeofday(); > if (same_second(current_time, last_time)) > { > wal_written_in_second += new_wal_written; > if (wal_written_in_second >= wal_write_limit_per_second) > { > double too_much = (wal_written_in_second - wal_write_limit_per_second); > sleep_fractional_seconds(too_much / wal_written_in_second); > > last_time = current_time; > } > } > else > { > last_time = current_time; > } > > which'd mean that in contrast to your example we'd not continually sleep > for WAL, we'd only do so if we actually exceeded (or are projected to > exceed in a smarter implementation), the specified WAL write rate. As > the 20ms sleeps from vacuum effectively reduce the WAL write rate, we'd > correspondingly sleep less. > Yes, that makes sense. > > And my main point is that even if you implement a proper bytes/sec limit > ONLY for WAL, the behaviour of VACUUM rate limiting doesn't get > meaningfully more confusing than right now. > So, why not to modify autovacuum to also use this approach? I wonder if the situation there is more complicated because of multiple workers sharing the same budget ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: