[HACKERS] Some thoughts about multi-server sync rep configurations - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | [HACKERS] Some thoughts about multi-server sync rep configurations |
Date | |
Msg-id | CAEepm=1GNCriNvWhPkWCqrsbXWGtWEEpvA-KnovMbht5ryzbmg@mail.gmail.com Whole thread Raw |
Responses |
Re: [HACKERS] Some thoughts about multi-server sync rep configurations
|
List | pgsql-hackers |
Hi, Sync rep with multiple standbys allows queries run on standbys to see transactions that haven't been flushed on the configured number of standbys. That means that it's susceptible to lost updates or a kind of "dirty read" in certain cluster reconfiguration scenarios. To close that gap, we would need to introduce extra communication so that standbys wait for flushes on other servers before a snapshot can be used, in certain circumstances. That doesn't sound like a popular performance feature, and I don't have a concrete proposal for that, but wanted to raise the topic for discussion to see if others have thought about it. I speculate that there are three things that need to be aligned for multi-server sync rep to be able to make complete guarantees about durability, and pass the kind of testing that someone like Aphyr of Jepsen[1] would throw at it. Of course you might argue that the guarantees about reconfiguration that I'm assuming in the following are not explicitly made anywhere, and I'd be very interested to hear what others have to say about them. Suppose you have K servers and you decide that you want to be able to lose N servers without data loss. Then as far as I can see the three things are: 1. While automatic replication cluster management is not part of Postgres, you must have a manual or automatic procedure with two important properties in order for synchronous replication to be able to reconfigure without data loss. At cluster reconfiguration time on the loss of the primary, you must be able to contact at least K - N servers and of those you must promote the server that has the highest LSN, otherwise there is no way to know that the latest successfully committed transaction is present in the new timeline. Furthermore, you must be able to contact more than K / 2 servers (a majority) to avoid split-brain syndrome. 2. The primary must wait for N standby servers to acknowledge flushing before returning from commit, as we do. 3. No server must allow a transaction to be visible that hasn't been flushed on N standby servers. We already prevent that on the primary, but not on standbys. You might see a transaction on a given standby, then lose that standby and the primary, and then a new primary might be elected that doesn't have that transaction. We don't have this problem if you only run queries on the primary, and we don't have it on single standby configurations ie K = 2 and N = 1. But as soon as K > 2 and N > 1, we can have the problem on standbys. Example: You have 2 servers in London, 2 in New York and 2 in Tokyo. You enable synchronous replication with N = 3. A transaction updates a record and commits locally on host "london1", and begins waiting for 3 servers to respond. A network fault prevents messages from London reaching the other two data centres, because the rack is on fire. But "london2" receives and applies the WAL. Now another session sees this transaction on "london2" and reports a fact that it represents to a customer. Finally failover software or humans determine that it's time to promote a new primary server in Tokyo. The fact reported to the customer has evaporated; that was a kind of "dirty read" that might have consequences for your business. [1] https://aphyr.com/tags/jepsen -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: