Thread: synchronous_commit = remote_flush

synchronous_commit = remote_flush

From

Thomas Munro

Date:

18 August 2016, 04:22:13

Hi hackers,

To do something about the confusion I keep seeing about what exactly
"on" means, I've often wished we had "remote_flush".  But it's not
obvious how the backwards compatibility could work, ie how to keep the
people happy who use "local" vs "on" to control syncrep, and also the
people who use "off" vs "on" to control asynchronous commit on
single-node systems.  Is there any sensible way to do that, or is it
not broken and I should pipe down, or is it just far too entrenched
and never going to change?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: synchronous_commit = remote_flush

From

Jim Nasby

Date:

18 August 2016, 18:31:09

On 8/17/16 11:22 PM, Thomas Munro wrote:
> Hi hackers,
>
> To do something about the confusion I keep seeing about what exactly
> "on" means, I've often wished we had "remote_flush".  But it's not
> obvious how the backwards compatibility could work, ie how to keep the
> people happy who use "local" vs "on" to control syncrep, and also the
> people who use "off" vs "on" to control asynchronous commit on
> single-node systems.  Is there any sensible way to do that, or is it
> not broken and I should pipe down, or is it just far too entrenched
> and never going to change?

I'm wondering if we've hit the point where trying to put all of this in 
a single GUC is a bad idea... changing that probably means a config 
compatibility break, but I don't think that's necessarily a bad thing at 
this point...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: synchronous_commit = remote_flush

From

Robert Haas

Date:

18 August 2016, 20:25:53

On Thu, Aug 18, 2016 at 12:22 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> To do something about the confusion I keep seeing about what exactly
> "on" means, I've often wished we had "remote_flush".  But it's not
> obvious how the backwards compatibility could work, ie how to keep the
> people happy who use "local" vs "on" to control syncrep, and also the
> people who use "off" vs "on" to control asynchronous commit on
> single-node systems.  Is there any sensible way to do that, or is it
> not broken and I should pipe down, or is it just far too entrenched
> and never going to change?

I don't see why we can't add "remote_flush" as a synonym for "on".  Do
you have something else in mind?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: synchronous_commit = remote_flush

From

Masahiko Sawada

Date:

19 August 2016, 07:33:36

On Fri, Aug 19, 2016 at 5:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Aug 18, 2016 at 12:22 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> To do something about the confusion I keep seeing about what exactly
>> "on" means, I've often wished we had "remote_flush".  But it's not
>> obvious how the backwards compatibility could work, ie how to keep the
>> people happy who use "local" vs "on" to control syncrep, and also the
>> people who use "off" vs "on" to control asynchronous commit on
>> single-node systems.  Is there any sensible way to do that, or is it
>> not broken and I should pipe down, or is it just far too entrenched
>> and never going to change?
>
> I don't see why we can't add "remote_flush" as a synonym for "on".  Do
> you have something else in mind?
>

+1 for adding "remote_flush" as a synonym for "on".
It doesn't break backward compatibility.

Regards,

--
Masahiko Sawada

Re: synchronous_commit = remote_flush

From

Thomas Munro

Date:

21 August 2016, 10:35:39

On Fri, Aug 19, 2016 at 7:32 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Aug 19, 2016 at 5:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Aug 18, 2016 at 12:22 AM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> To do something about the confusion I keep seeing about what exactly
>>> "on" means, I've often wished we had "remote_flush".  But it's not
>>> obvious how the backwards compatibility could work, ie how to keep the
>>> people happy who use "local" vs "on" to control syncrep, and also the
>>> people who use "off" vs "on" to control asynchronous commit on
>>> single-node systems.  Is there any sensible way to do that, or is it
>>> not broken and I should pipe down, or is it just far too entrenched
>>> and never going to change?
>>
>> I don't see why we can't add "remote_flush" as a synonym for "on".  Do
>> you have something else in mind?
>>
>
> +1 for adding "remote_flush" as a synonym for "on".
> It doesn't break backward compatibility.

Right, we could just add it to guc.c after "on", so that you can "SET
synchronous_commit TO remote_flush", but then "SHOW
synchronous_commit" returns "on".

The problem I was thinking about was this: if you add "remote_flush"
before "on" in guc.c, then "SHOW ..." will return "remote_flush",
which would be really helpful for users trying to understand what
syncrep is actually doing; but it would probably confuse single node
users and async replication users.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: synchronous_commit = remote_flush

From

Christoph Berg

Date:

21 August 2016, 12:16:38

Re: Thomas Munro 2016-08-21 <CAEepm=0EQvwhFih7wZ+cHL=UJDvF4KSe0thw1gPEY-ga3DcvmQ@mail.gmail.com>
> Right, we could just add it to guc.c after "on", so that you can "SET
> synchronous_commit TO remote_flush", but then "SHOW
> synchronous_commit" returns "on".
> 
> The problem I was thinking about was this: if you add "remote_flush"
> before "on" in guc.c, then "SHOW ..." will return "remote_flush",
> which would be really helpful for users trying to understand what
> syncrep is actually doing; but it would probably confuse single node
> users and async replication users.

Maybe "flush" would work, given it applies locally and on the remote
side? (And "local" could be "local_flush"...?)

Christoph

Re: synchronous_commit = remote_flush

From

Thomas Munro

Date:

21 August 2016, 22:09:00

On Fri, Aug 19, 2016 at 6:30 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> I'm wondering if we've hit the point where trying to put all of this in a
> single GUC is a bad idea... changing that probably means a config
> compatibility break, but I don't think that's necessarily a bad thing at
> this point...

Aside from the (IMHO) slightly confusing way that "on" works, which is
the smaller issue I was raising in this thread, I agree that we might
eventually want to escape from the assumption that "local apply" (=
off), local flush, remote write, remote flush, remote apply happen in
that order and therefore a single linear control knob can describe
which of those to wait for.

Some pie-in-the-sky thoughts: we currently can't reach
"group-safe"[1], where you wait only for N servers to have the WAL in
memory (let's say that for us that means write but not flush): the
closest we can get is "1-safe and group-safe", using remote_write to
wait for the standbys to write (= "group-safe"), which implies local
flush (= "1-safe").  Now that'd be a terrible level to use unless your
recovery procedure included cluster-wide communication to straighten
things out, and without any such clusterware it makes a lot of sense
to have the master flush before sending, and I'm not actually
proposing we change that, I'm just speculating that someone might
eventually want it.  We also can't have standbys apply before they
flush; as far as I know there is no theoretical reason why that
shouldn't be allowed, except maybe for some special synchronisation
steps around checkpoint records so that recovery doesn't get too far
ahead.  That'd mirror what happens on the master more closely.
Imagine if you wanted to wait for your transaction to become visible
on certain other servers, but didn't want to wait for any disks:
that'd be the distributed equivalent of today's "off", but today's
"remote_apply" implies local flush and remote flush.  Or more likely
you'd want some combination: 2-safe or group-safe on some subset of
servers to satisfy your durability requirements, and applied on some
other perhaps larger subset of servers for consistency.  But this is
just water cooler handwaving.

[1] https://infoscience.epfl.ch/record/49936/files/WS03

-- 
Thomas Munro
http://www.enterprisedb.com

Re: synchronous_commit = remote_flush

From

Robert Haas

Date:

22 August 2016, 02:05:48

On Sun, Aug 21, 2016 at 6:08 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Fri, Aug 19, 2016 at 6:30 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> I'm wondering if we've hit the point where trying to put all of this in a
>> single GUC is a bad idea... changing that probably means a config
>> compatibility break, but I don't think that's necessarily a bad thing at
>> this point...
>
> Aside from the (IMHO) slightly confusing way that "on" works, which is
> the smaller issue I was raising in this thread, I agree that we might
> eventually want to escape from the assumption that "local apply" (=
> off), local flush, remote write, remote flush, remote apply happen in
> that order and therefore a single linear control knob can describe
> which of those to wait for.
>
> Some pie-in-the-sky thoughts: we currently can't reach
> "group-safe"[1], where you wait only for N servers to have the WAL in
> memory (let's say that for us that means write but not flush): the
> closest we can get is "1-safe and group-safe", using remote_write to
> wait for the standbys to write (= "group-safe"), which implies local
> flush (= "1-safe").  Now that'd be a terrible level to use unless your
> recovery procedure included cluster-wide communication to straighten
> things out, and without any such clusterware it makes a lot of sense
> to have the master flush before sending, and I'm not actually
> proposing we change that, I'm just speculating that someone might
> eventually want it.  We also can't have standbys apply before they
> flush; as far as I know there is no theoretical reason why that
> shouldn't be allowed, except maybe for some special synchronisation
> steps around checkpoint records so that recovery doesn't get too far
> ahead.

Well, in order to remain recoverable, the standby has to obey the
WAL-before-data rule: if it writes a page with a given LSN, that LSN
had better be flushed to disk first.  In practice, this means that if
you want a standby to remain recoverable without needing to contact
the rest of the cluster, you can't let its minimum recovery point pass
the WAL flush point.  In short, this comes up anytime you evict a
buffer, not just around checkpoints.

> That'd mirror what happens on the master more closely.
> Imagine if you wanted to wait for your transaction to become visible
> on certain other servers, but didn't want to wait for any disks:
> that'd be the distributed equivalent of today's "off", but today's
> "remote_apply" implies local flush and remote flush.  Or more likely
> you'd want some combination: 2-safe or group-safe on some subset of
> servers to satisfy your durability requirements, and applied on some
> other perhaps larger subset of servers for consistency.  But this is
> just water cooler handwaving.

Sure, that stuff would be great, and we'll probably have to redesign
synchronous_commit entirely if and when we get there, but I'm not sure
it makes sense to tinker with it now just for that.  The original
reason why I suggested the current design for synchronous_commit is to
avoid forcing people to set yet another GUC in order to use
synchronous replication.  The default of 'on' means that you can just
configure synchronous_standby_names and away you go.  Perhaps a better
design as we added more values would have been to keep
synchronous_commit as on/local/off and use a separate GUC, say,
synchronous_replication to define what "on" means: remote_apply,
remote_flush, remote_apply, 2safe+groupsafe, or whatever.  And when
synchronous_standby_names='' then the value of synchronous_replication
is ignored, and synchronous_commit=on means the same as
synchronous_commit=local, just as it does today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company