Re: Issues with two-server Synch Rep - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Issues with two-server Synch Rep |
Date | |
Msg-id | AANLkTi=OeBKQchvHiXru_ZD7Tm31K_ydC7a7Py1TPQPG@mail.gmail.com Whole thread Raw |
In response to | Issues with two-server Synch Rep (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Issues with two-server Synch Rep
|
List | pgsql-hackers |
On Thu, Oct 7, 2010 at 2:05 PM, Josh Berkus <josh@agliodbs.com> wrote: > What is the procedure for adding a new synchronous standby in your > implementation? That is, how do we go from having a standby server with > an empty PGDATA to having a working synchronous standby? I'll take a crack at answering these. I don't think that the procedure for setting up a standby server is going to change much. The idea is presumably that you set up an async standby more or less as you do now and then make whatever configuration changes are necessary to flip it to synchronous. > During 9.0 development discussion, one of the things we realized we > needed for synch standby was publication of snapshots back to the master > in order to prevent query cancel on the standby. Without this, the > synch standby is useless for running read queries. Does your patch > implement this? Please describe. This is a completely separate issue from making replication synchronous. And, really? Useless for running read queries? > One of the serious flaws currently in HS/SR is complexity of > administration. Setting up and configuring even a single master and > single standby requires editing up to 6 configuration files in Postgres, > as well as dealing with file permissions. As such, any Synch Rep patch > must work together with attempts to simplify administration. How does > your design do this? This is also completely out of scope for sync rep. > Synch rep offers severe penalties to availability if a synch standby > gets behind or goes down. What replication-specific monitoring tools > and hooks are available to allow administators to take action before the > database becomes unavailable? I don't think there's much hope of allowing administrators to take action BEFORE the database becomes unavailable. The point of making replication synchronous rather than asynchronous is that the slave can't be behind AT ALL, and if it goes down the primary is immediately stuck. If the synchronous standby vanishes, the master can recover if: 1. We turn off synchronous replication, or 2. TCP keepalives or some other mechanism kills the master-slave connection after a suitable timeout, and we interpret (or configure) no connected standbys = stop synchronous replication. > In the event that the synch rep standby falls too far behind or becomes > unavailable, or is deliberately taken offline, what are you envisioning > as the process for the DBA resolving the situation? Is there any > ability to commit "stuck" transactions? Again, it can't fall "too far" behind. It can't be behind at all. Any stuck transactions are necessarily already committed; the commit just hasn't been acknowledged to the client yet. Presumably, if synchronous replication is disabled via (1) or (2) above, then any outstanding committed-but-unacknowledged-to-the-client transactions should notify the client of the commit and continue on. > With a standby in "apply" mode, and a master failure at the wrong time, > there is the possibility that the Standby will apply a transaction at > the same time that the master crashes, causing the client to never > receive a commit message. Once the client reconnects to the standby, > how will it know whether its transaction was committed or not? If a client loses the connection after issuing a commit but before receiving the acknowledgment, it can't know whether the commit happened or not. This is true regardless of whether there is a standby and regardless of whether that standby is synchronous. Clients that care need to implement their own mechanisms for resolving this difficulty. > As a lesser case, a standby in "apply" mode will show the results of > committed transactions *before* they are visible on the master. Is > there any need to handle this? If so, how? It's theoretically impossible for the transaction to become visible everywhere simultaneously. It's already the case that transactions become visible to other backends before the backend doing the commit has received an acknowledgment. Any client relying on any other behavior is already broken. > As with XA, synch rep has the potential to be so slow as to be unusable. > What optimizations to you make in your approach to synch rep to make it > faster than two-phase commit? What other performance optimizations have > you added? Sync rep is going to be slow, period. Every implementation currently on the table has to fsync on the master, and then send the commit xlog record to the slave and wait for an acknowledgment from the slave. Allowing those to happen in parallel is going to be Hard. Also, the interaction with max_standby_delay is going to be a big problem, I suspect. As for the specific optimizations in each patch, I believe the major thing that differs between them is the exact timing of the acknowledgments; but perhaps I should let the patch authors speak to that question, if they wish to do so. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: