Re: Standalone synchronous master - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Re: Standalone synchronous master |
Date | |
Msg-id | 52D2F42F.1070306@agliodbs.com Whole thread Raw |
In response to | Re: Standalone synchronous master (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Standalone synchronous master
Re: Standalone synchronous master Re: Standalone synchronous master Re: Standalone synchronous master |
List | pgsql-hackers |
All, I'm leading this off with a review of the features offered by the actual patch submitted. My general discussion of the issues of Sync Degrade, which justifies my specific suggestions below, follows that. Rajeev, please be aware that other hackers may have different opinions than me on what needs to change about the patch, so you should collect all opinions before changing code. ======================= > Add a new parameter : > synchronous_standalone_master = on | off I think this is a TERRIBLE name for any such parameter. What does "synchronous standalone" even mean? A better name for the parameter would be "auto_degrade_sync_replication" or "synchronous_timeout_action = error | degrade", or something similar. It would be even better for this to be a mode of synchronous_commit, except that synchronous_commit is heavily overloaded already. Some issues raised by this log script: LOG: standby "tx0113" is now the synchronous standby with priority 1 LOG: waiting for standby synchronization <-- standby wal receiver on the standby is killed (SIGKILL) LOG: unexpected EOF on standby connection LOG: not waiting for standby synchronization <-- restart standby so that it connects again LOG: standby "tx0113" is now the synchronous standby with priority 1 LOG: waiting for standby synchronization <-- standby wal receiver is first stopped (SIGSTOP) to make sure The "not waiting for standby synchronization" message should be marked something stronger than LOG. I'd like ERROR. Second, you have the master resuming sync rep when the standby reconnects. How do you determine when it's safe to do that? You're making the assumption that you have a failing sync standby instead of one which simply can't keep up with the master, or a flakey network connection (see discussion below). > a. Master_to_standalone_cmd: To be executed before master switches to standalone mode. > > b. Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode. I'm not at all clear what the difference between these two commands is.When would one be excuted, and when would the otherbe executed? Also, renaming ... Missing features: a) we should at least send committing clients a WARNING if they have commited a synchronous transaction and we are in degraded mode. I know others have dismissed this idea as too "talky", but from my perspective, the agreement with the client for each synchronous commit is being violated, so each and every synchronous commit should report failure to sync. Also, having a warning on every commit would make it easier to troubleshoot degraded mode for users who have ignored the other warnings we give them. b) pg_stat_replication needs to show degraded mode in some way, or we need pg_sync_rep_degraded(), or (ideally) both. I'm also wondering if we need a more sophisticated approach to wal_sender_timeout to go with all this. ======================= On 01/11/2014 08:33 PM, Bruce Momjian wrote: > On Sat, Jan 11, 2014 at 07:18:02PM -0800, Josh Berkus wrote: >> In other words, if we're going to have auto-degrade, the most >> intelligent place for it is in >> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* >> place. Anything we do *inside* Postgres is going to have a really, >> really hard time determining when to degrade. > > Well, one goal I was considering is that if a commit is hung waiting for > slave sync confirmation, and the timeout happens, then the mode is > changed to degraded and the commit returns success. I am not sure how > you would do that in an external tool, meaning there is going to be > period where commits fail, unless you think there is a way that when the > external tool changes the mode to degrade that all hung commits > complete. That would be nice. Realistically, though, that's pretty unavoidable. Any technique which waits a reasonable interval to determine that the replica isn't going to respond is liable to go beyond the application's timeout threshold anyway. There are undoubtedly exceptions to that, but it will be the case a lot of the time -- how many applications are willing to wait *minutes* for a COMMIT? I also don't see any way to allow the hung transactions to commit without allowing the walsender to make a decision on degrading. As I've outlined elsewhere (and below), the walsender just doesn't have enough information to make a good decision. On 01/11/2014 08:52 PM, Amit Kapila wrote:> It is better than async mode in a way such that in async mode it never > waits for commits to be written to standby, but in this new mode it will > do so unless it is not possible (all sync standby's goes down). > Can't we use existing wal_sender_timeout, or even if user expects a > different timeout because for this new mode, he expects master to wait > more before it start operating like standalone sync master, we can provide > a new parameter. One of the reasons that there's so much disagreement about this feature is that most of the folks strongly in favor of auto-degrade are thinking *only* of the case that the standby is completely down. There are many other reasons for a sync transaction to hang, and the walsender has absolutely no way of knowing which is the case. For example: * Transient network issues * Standby can't keep up with master * Postgres bug * Storage/IO issues (think EBS) * Standby is restarting You don't want to handle all of those issues the same way as far as sync rep is concerned. For example, if the standby is restaring, you probably want to wait instead of degrading. There's also the issue that this patch, and necessarily any walsender-level auto-degrade, has IMHO no safe way to resume sync replication. This means that any use who has a network or storage blip once a day (again, think AWS) would be constantly in degraded mode, even though both the master and the replica are up and running -- and it will come as a complete surprise to them when the lose the master and discover that they've lost data. This is why, as I've said, any auto-degrade patch needs to treat auto-degrade as a major event, and alert users in all ways reasonable. See my concrete proposals at the beginning of this email for what I mean. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
pgsql-hackers by date: