Re: Sync Rep: First Thoughts on Code - Mailing list pgsql-hackers
From | Aidan Van Dyk |
---|---|
Subject | Re: Sync Rep: First Thoughts on Code |
Date | |
Msg-id | 20081211142712.GW26596@yugib.highrise.ca Whole thread Raw |
In response to | Re: Sync Rep: First Thoughts on Code (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Sync Rep: First Thoughts on Code
|
List | pgsql-hackers |
* Simon Riggs <simon@2ndQuadrant.com> [081211 05:45]: > > On Wed, 2008-12-10 at 15:06 -0500, Aidan Van Dyk wrote: > > > Call me think, but I'm confused... In sync rep, there *can't be* any > > catchign up do do... i.e. if the "slave" isn't accepting the WAL the > > master "stops" doing *anything*... > > In normal/steady state, yes, you are correct. But there is more... > > The simplest way to configure standby would be to freeze the primary > while we setup the standby and then go straight into normal/steady > state. That could mean hours of downtime for large databases, which is > unacceptable in a feature aimed at increasing availability. So we need > to allow the primary to continue working while the standby is setup. > That then creates a log gap between the LSN of the primary and the LSN > of the standby, which must be resolved. > > So the catchup occurs during the transient initial phase when standby is > catching up with primary before they continue together in normal/steady > state. But "catchup" *has* to be *done* before PostgreSQL can enter "sync rep". So, if I start PostgreSQL in sync rep mode, without any capable clients to rep with.... But I'ld rather be buggered there then find out tonight at 3am that it was in sync rep mode but wasn't really doing sync rep, becus I'ld messed up something somewhere (firewall, config, password, anything) and ther ewas not "caught up" client at the time, and I've just lost a days' worth of my $$$$$ transactions... > Most of the architectural discussion over last few months has been about > the need for the initial state and how to handle it. Most of the code > complexity also. Well, for me, I'm quite happy with a "restart/stop&start" being a necessary "downtime" to move to synchronous replication. This way, I could see a "setup" routing that looks like: 1) Current "production" DB does normal backups/PITR/WAL archiving 2) I setup new "slave", which involves - restore from backup + wal recover (pg_standby type) - Could take days+++ - Ohwell.... 3) Stop production 4) so, now slave is caught up... 5) Start "production" now in sync rep mode as master 6) start slave in sync-rep mode as slave... So downtime would be limited to the time from the old postmaster shutdown to the time the slave has replayed the last WAL and connected to the restarted postmaster as a sync rep slave... Or am I way too naive to think that a small downtime to "switch" from non-sync-rep to sync-rep is acceptable... a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
pgsql-hackers by date: