Re: Synch failover WAS: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Re: Synch failover WAS: Support for N synchronous standby servers - take 2 |
Date | |
Msg-id | 5596C5E9.5010204@agliodbs.com Whole thread Raw |
In response to | Re: Support for N synchronous standby servers - take 2 (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Synch failover WAS: Support for N synchronous standby
servers - take 2
|
List | pgsql-hackers |
On 07/03/2015 03:12 AM, Sawada Masahiko wrote: > Thanks. So we can choice the next master server using by checking the > progress of each server, if hot standby is enabled. > And a such procedure is needed even today replication. > > I think that the #2 problem which is Josh pointed out seems to be solved; > 1. I need to ensure that data is replicated to X places. > 2. I need to *know* which places data was synchronously replicated > to when the master goes down. > And we can address #1 problem using quorum commit. It's not solved. I still have zero ways of knowing if a replica was in sync or not at the time the master went down. Now, you and others have argued persuasively that there are valuable use cases for quorum commit even without solving that particular issue, but there's a big difference between "we can work around this problem" and the problem is solved. I forked the subject line because I think that the inability to identify synch replicas under failover conditions is a serious problem with synch rep *today*, and pretending that it doesn't exist doesn't help us even if we don't fix it in 9.6. Let me give you three cases where our lack of information on the replica side about whether it thinks it's in sync or not causes synch rep to fail to protect data. The first case is one I've actually seen in production, and the other two are hypothetical but entirely plausible. Case #1: two synchronous replica servers have the application name "synchreplica". An admin uses the wrong Chef template, and deploys a server which was supposed to be an async replica with the same recovery.conf template, and it ends up in the "synchreplica" group as well. Due to restarts (pushing out an update release), the new server ends up seizing and keeping sync. Then the master dies. Because the new server wasn't supposed to be a sync replica in the first place, it is not checked; they just fail over to the furthest ahead of the two original synch replicas, neither of which was actually in synch. Case #2: "2 { local, london, nyc }" setup. At 2am, the links between data centers become unreliable, such that the on-call sysadmin disables synch rep because commits on the master are intolerably slow. Then, at 10am, the links between data centers fail entirely. The day shift, not knowing that the night shift disabled sync, fail over to London thinking that they can do so with zero data loss. Case #3 "1 { london, frankfurt }, 1 { sydney, tokyo }" multi-group priority setup. We lose communication with everything but Europe. How can we decide whether to wait to get sydney back, or to promote London immedately? I could come up with numerous other situations, but all of the three above completely reasonable cases show how having the knowledge of what time a replica thought it was last in sync is vital to preventing bad failovers and data loss, and to knowing the quantity of data loss when it can't be prevented. It's an issue *now* that the only data we have about the state of sync rep is on the master, and dies with the master. And it severely limits the actual utility of our synch rep. People implement synch rep in the first place because the "best effort" of asynch rep isn't good enough for them, and yet when it comes to failover we're just telling them "give it your best effort". -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
pgsql-hackers by date: