Re: Keepalive for max_standby_delay - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Keepalive for max_standby_delay |
Date | |
Msg-id | 10389.1275506064@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Keepalive for max_standby_delay (Greg Stark <gsstark@mit.edu>) |
Responses |
Re: Keepalive for max_standby_delay
|
List | pgsql-hackers |
Greg Stark <gsstark@mit.edu> writes: > On Wed, Jun 2, 2010 at 6:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I believe that the motivation for treating archived timestamps as live >> is, essentially, to force rapid catchup if a slave falls behind so far >> that it's reading from archive instead of SR. > Huh, I think this is the first mention of this that I've seen. I > always assumed the motivation was just that you wanted to control how > much data loss could occur on failover and how long recovery would > take. I think separating the two delays is an interesting idea but I > don't see how it counts as a simplification. Well, it isn't a simplification: it's bringing it up to the minimum complication level where it'll actually work sanely. The current implementation doesn't work sanely because it confuses stale timestamps read from WAL with real live time. > This also still allows a slave to become arbitrarily far behind the > master. Indeed, but nothing we do can prevent that, if the slave is just plain slower than the master. You have to assume that the slave is capable of keeping up in the absence of query-caused delays, or you're hosed. The real reason this is at issue is the fact that the max_standby_delay kill mechanism applies to certain buffer-level locking operations. On the master we just wait, and it's not a problem, because in practice the conflicting queries almost always release these locks pretty quick. On the slave, though, instant kill as a result of a buffer-level lock conflict would result in a very serious degradation in standby query reliability (while also doing practically nothing for the speed of WAL application, most of the time). This morning on the phone Bruce and I were seriously discussing the idea of ripping the max_standby_delay mechanism out of the buffer-level locking paths, and just let them work like they do on the master, ie, wait forever. If we did that then simplifying max_standby_delay to a boolean would be reasonable again (because it really would only come into play for DDL on the master). The sticky point is that once in a blue moon you do have a conflicting query sitting on a buffer lock for a long time, or even more likely a series of queries keeping the WAL replay process from obtaining buffer cleanup lock. So it seems that we have to have max_standby_delay-like logic for those locks, and also that zero grace period before kill isn't a very practical setting. However, there isn't a lot of point in obsessing over exactly how long the grace period ought to be, as long as it's more than a few milliseconds. It *isn't* going to have any real effect on whether the slave can stay caught up. You could make a fairly decent case for just measuring the grace period from when the replay process starts to wait, as I think I proposed awhile back. The value of measuring delay from a receipt time is that if you do happen to have a bunch of delays within a short interval you'll get more willing to kill queries --- but I really believe that that is a corner case and will have nothing to do with ordinary performance. > I propose an alternate way out of the problem of syncing two clocks. > Instead of comparing timestamps compare time intervals. So as it reads > xlog records it only ever compares the master timestamps with previous > master timestamps to determine how much time has elapsed on the > master. It compares that time elapsed with the time elapsed on the > slave to determine if it's falling behind. I think this would just add complexity and uncertainty, to deal with something that won't be much of a problem in practice. regards, tom lane
pgsql-hackers by date: