Thread: synchronous_commit = remote_flush
Hi hackers, To do something about the confusion I keep seeing about what exactly "on" means, I've often wished we had "remote_flush". But it's not obvious how the backwards compatibility could work, ie how to keep the people happy who use "local" vs "on" to control syncrep, and also the people who use "off" vs "on" to control asynchronous commit on single-node systems. Is there any sensible way to do that, or is it not broken and I should pipe down, or is it just far too entrenched and never going to change? -- Thomas Munro http://www.enterprisedb.com
On 8/17/16 11:22 PM, Thomas Munro wrote: > Hi hackers, > > To do something about the confusion I keep seeing about what exactly > "on" means, I've often wished we had "remote_flush". But it's not > obvious how the backwards compatibility could work, ie how to keep the > people happy who use "local" vs "on" to control syncrep, and also the > people who use "off" vs "on" to control asynchronous commit on > single-node systems. Is there any sensible way to do that, or is it > not broken and I should pipe down, or is it just far too entrenched > and never going to change? I'm wondering if we've hit the point where trying to put all of this in a single GUC is a bad idea... changing that probably means a config compatibility break, but I don't think that's necessarily a bad thing at this point... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
On Thu, Aug 18, 2016 at 12:22 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > To do something about the confusion I keep seeing about what exactly > "on" means, I've often wished we had "remote_flush". But it's not > obvious how the backwards compatibility could work, ie how to keep the > people happy who use "local" vs "on" to control syncrep, and also the > people who use "off" vs "on" to control asynchronous commit on > single-node systems. Is there any sensible way to do that, or is it > not broken and I should pipe down, or is it just far too entrenched > and never going to change? I don't see why we can't add "remote_flush" as a synonym for "on". Do you have something else in mind? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Aug 19, 2016 at 5:25 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Aug 18, 2016 at 12:22 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> To do something about the confusion I keep seeing about what exactly >> "on" means, I've often wished we had "remote_flush". But it's not >> obvious how the backwards compatibility could work, ie how to keep the >> people happy who use "local" vs "on" to control syncrep, and also the >> people who use "off" vs "on" to control asynchronous commit on >> single-node systems. Is there any sensible way to do that, or is it >> not broken and I should pipe down, or is it just far too entrenched >> and never going to change? > > I don't see why we can't add "remote_flush" as a synonym for "on". Do > you have something else in mind? > +1 for adding "remote_flush" as a synonym for "on". It doesn't break backward compatibility. Regards, -- Masahiko Sawada
On Fri, Aug 19, 2016 at 7:32 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, Aug 19, 2016 at 5:25 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Aug 18, 2016 at 12:22 AM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >>> To do something about the confusion I keep seeing about what exactly >>> "on" means, I've often wished we had "remote_flush". But it's not >>> obvious how the backwards compatibility could work, ie how to keep the >>> people happy who use "local" vs "on" to control syncrep, and also the >>> people who use "off" vs "on" to control asynchronous commit on >>> single-node systems. Is there any sensible way to do that, or is it >>> not broken and I should pipe down, or is it just far too entrenched >>> and never going to change? >> >> I don't see why we can't add "remote_flush" as a synonym for "on". Do >> you have something else in mind? >> > > +1 for adding "remote_flush" as a synonym for "on". > It doesn't break backward compatibility. Right, we could just add it to guc.c after "on", so that you can "SET synchronous_commit TO remote_flush", but then "SHOW synchronous_commit" returns "on". The problem I was thinking about was this: if you add "remote_flush" before "on" in guc.c, then "SHOW ..." will return "remote_flush", which would be really helpful for users trying to understand what syncrep is actually doing; but it would probably confuse single node users and async replication users. -- Thomas Munro http://www.enterprisedb.com
Re: Thomas Munro 2016-08-21 <CAEepm=0EQvwhFih7wZ+cHL=UJDvF4KSe0thw1gPEY-ga3DcvmQ@mail.gmail.com> > Right, we could just add it to guc.c after "on", so that you can "SET > synchronous_commit TO remote_flush", but then "SHOW > synchronous_commit" returns "on". > > The problem I was thinking about was this: if you add "remote_flush" > before "on" in guc.c, then "SHOW ..." will return "remote_flush", > which would be really helpful for users trying to understand what > syncrep is actually doing; but it would probably confuse single node > users and async replication users. Maybe "flush" would work, given it applies locally and on the remote side? (And "local" could be "local_flush"...?) Christoph
On Fri, Aug 19, 2016 at 6:30 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > I'm wondering if we've hit the point where trying to put all of this in a > single GUC is a bad idea... changing that probably means a config > compatibility break, but I don't think that's necessarily a bad thing at > this point... Aside from the (IMHO) slightly confusing way that "on" works, which is the smaller issue I was raising in this thread, I agree that we might eventually want to escape from the assumption that "local apply" (= off), local flush, remote write, remote flush, remote apply happen in that order and therefore a single linear control knob can describe which of those to wait for. Some pie-in-the-sky thoughts: we currently can't reach "group-safe"[1], where you wait only for N servers to have the WAL in memory (let's say that for us that means write but not flush): the closest we can get is "1-safe and group-safe", using remote_write to wait for the standbys to write (= "group-safe"), which implies local flush (= "1-safe"). Now that'd be a terrible level to use unless your recovery procedure included cluster-wide communication to straighten things out, and without any such clusterware it makes a lot of sense to have the master flush before sending, and I'm not actually proposing we change that, I'm just speculating that someone might eventually want it. We also can't have standbys apply before they flush; as far as I know there is no theoretical reason why that shouldn't be allowed, except maybe for some special synchronisation steps around checkpoint records so that recovery doesn't get too far ahead. That'd mirror what happens on the master more closely. Imagine if you wanted to wait for your transaction to become visible on certain other servers, but didn't want to wait for any disks: that'd be the distributed equivalent of today's "off", but today's "remote_apply" implies local flush and remote flush. Or more likely you'd want some combination: 2-safe or group-safe on some subset of servers to satisfy your durability requirements, and applied on some other perhaps larger subset of servers for consistency. But this is just water cooler handwaving. [1] https://infoscience.epfl.ch/record/49936/files/WS03 -- Thomas Munro http://www.enterprisedb.com
On Sun, Aug 21, 2016 at 6:08 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Fri, Aug 19, 2016 at 6:30 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: >> I'm wondering if we've hit the point where trying to put all of this in a >> single GUC is a bad idea... changing that probably means a config >> compatibility break, but I don't think that's necessarily a bad thing at >> this point... > > Aside from the (IMHO) slightly confusing way that "on" works, which is > the smaller issue I was raising in this thread, I agree that we might > eventually want to escape from the assumption that "local apply" (= > off), local flush, remote write, remote flush, remote apply happen in > that order and therefore a single linear control knob can describe > which of those to wait for. > > Some pie-in-the-sky thoughts: we currently can't reach > "group-safe"[1], where you wait only for N servers to have the WAL in > memory (let's say that for us that means write but not flush): the > closest we can get is "1-safe and group-safe", using remote_write to > wait for the standbys to write (= "group-safe"), which implies local > flush (= "1-safe"). Now that'd be a terrible level to use unless your > recovery procedure included cluster-wide communication to straighten > things out, and without any such clusterware it makes a lot of sense > to have the master flush before sending, and I'm not actually > proposing we change that, I'm just speculating that someone might > eventually want it. We also can't have standbys apply before they > flush; as far as I know there is no theoretical reason why that > shouldn't be allowed, except maybe for some special synchronisation > steps around checkpoint records so that recovery doesn't get too far > ahead. Well, in order to remain recoverable, the standby has to obey the WAL-before-data rule: if it writes a page with a given LSN, that LSN had better be flushed to disk first. In practice, this means that if you want a standby to remain recoverable without needing to contact the rest of the cluster, you can't let its minimum recovery point pass the WAL flush point. In short, this comes up anytime you evict a buffer, not just around checkpoints. > That'd mirror what happens on the master more closely. > Imagine if you wanted to wait for your transaction to become visible > on certain other servers, but didn't want to wait for any disks: > that'd be the distributed equivalent of today's "off", but today's > "remote_apply" implies local flush and remote flush. Or more likely > you'd want some combination: 2-safe or group-safe on some subset of > servers to satisfy your durability requirements, and applied on some > other perhaps larger subset of servers for consistency. But this is > just water cooler handwaving. Sure, that stuff would be great, and we'll probably have to redesign synchronous_commit entirely if and when we get there, but I'm not sure it makes sense to tinker with it now just for that. The original reason why I suggested the current design for synchronous_commit is to avoid forcing people to set yet another GUC in order to use synchronous replication. The default of 'on' means that you can just configure synchronous_standby_names and away you go. Perhaps a better design as we added more values would have been to keep synchronous_commit as on/local/off and use a separate GUC, say, synchronous_replication to define what "on" means: remote_apply, remote_flush, remote_apply, 2safe+groupsafe, or whatever. And when synchronous_standby_names='' then the value of synchronous_replication is ignored, and synchronous_commit=on means the same as synchronous_commit=local, just as it does today. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company