Re: Standalone synchronous master - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Standalone synchronous master |
Date | |
Msg-id | 52CD1564.6060504@vmware.com Whole thread Raw |
In response to | Standalone synchronous master (Rajeev rastogi <rajeev.rastogi@huawei.com>) |
Responses |
Re: Standalone synchronous master
Re: Standalone synchronous master |
List | pgsql-hackers |
On 11/13/2013 03:09 PM, Rajeev rastogi wrote: > This patch implements the following TODO item: > > Add a new "eager" synchronous mode that starts out synchronous but reverts to asynchronous after a failure timeout period > This would require some type of command to be executed to alert administrators of this change. > http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php > > This patch implementation is in the same line as it was given in the earlier thread. > Some Of the additional important changes are: > > 1. Have added two GUC variable to take commands from user to be executed > > a. Master_to_standalone_cmd: To be executed before master switches to standalone mode. > > b. Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode. > > 2. Master mode switch will happen only if the corresponding command executed successfully. > > 3. Taken care of replication timeout to decide whether synchronous standby has gone down. i.e. only after expiryof > > wal_sender_timeout, the master will switch from sync mode to standalone mode. > > Please provide your opinion or any other expectation out of this patch. I'm going to say right off the bat that I think the whole notion to automatically disable synchronous replication when the standby goes down is completely bonkers. If you don't need the strong guarantee that your transaction is safe in at least two servers before it's acknowledged to the client, there's no point enabling synchronous replication in the first place. If you do need it, then you shouldn't fall back to a degraded mode, at least not automatically. It's an idea that keeps coming back, but I have not heard a convincing argument why it makes sense. It's been discussed many times before, most recently in that thread you linked to. Now that I got that out of the way, I concur that some sort of hooks or commands that fire when a standby goes down or comes back up makes sense, for monitoring purposes. I don't much like this particular design. If you just want to write log entry, when all the standbys are disconnected, running a shell command seems like an awkward interface. It's OK for raising an alarm, but there are many other situations where you might want to raise alarms, so I'd rather have us implement some sort of a generic trap system, instead of adding this one particular extra config option. What do people usually use to monitor replication? There are two things we're trying to solve here: raising an alarm when something interesting happens, and changing the configuration to temporarily disable synchronous replication. What would be a good API to disable synchronous replication? Editing the config file and SIGHUPing is not very nice. There's been talk of an ALTER command to change the config, but I'm not sure that's a very good API either. Perhaps expose the sync_master_in_standalone_mode variable you have in your patch to new SQL-callable functions. Something like: pg_disable_synchronous_replication() pg_enable_synchronous_replication() I'm not sure where that state would be stored. Should it persist restarts? And you probably should get some sort of warnings in the log when synchronous replication is disabled. In summary, more work is required to design a good user/admin/programming interface. Let's hear a solid proposal for that, before writing patches. BTW, calling an external command with system(), while holding SyncRepLock in exclusive-mode, seems like a bad idea. For starters, holding a lock will prevent a new WAL sender from starting up and becoming a synchronous standby, and the external command might take a long time to return. - Heikki
pgsql-hackers by date: