Home > mailing lists

Re: Sync Rep: First Thoughts on Code - Mailing list pgsql-hackers

From	Aidan Van Dyk
Subject	Re: Sync Rep: First Thoughts on Code
Date	December 11, 2008 10:25:47
Msg-id	20081211142712.GW26596@yugib.highrise.ca Whole thread Raw
In response to	Re: Sync Rep: First Thoughts on Code (Simon Riggs <simon@2ndQuadrant.com>)
Responses	Re: Sync Rep: First Thoughts on Code
List	pgsql-hackers

Tree view

* Simon Riggs <simon@2ndQuadrant.com> [081211 05:45]:
> 
> On Wed, 2008-12-10 at 15:06 -0500, Aidan Van Dyk wrote:
> 
> > Call me think, but I'm confused... In sync rep, there *can't be* any
> > catchign up do do... i.e. if the "slave" isn't accepting the WAL the
> > master "stops" doing *anything*...
> 
> In normal/steady state, yes, you are correct. But there is more...
> 
> The simplest way to configure standby would be to freeze the primary
> while we setup the standby and then go straight into normal/steady
> state. That could mean hours of downtime for large databases, which is
> unacceptable in a feature aimed at increasing availability. So we need
> to allow the primary to continue working while the standby is setup.
> That then creates a log gap between the LSN of the primary and the LSN
> of the standby, which must be resolved.
> 
> So the catchup occurs during the transient initial phase when standby is
> catching up with primary before they continue together in normal/steady
> state. 

But "catchup" *has* to be *done* before PostgreSQL can enter "sync rep".

So, if I start PostgreSQL in sync rep mode, without any capable clients
to rep with....  But I'ld rather be buggered there then find out tonight
at 3am that it was in sync rep mode but wasn't really doing sync rep,
becus I'ld messed up something somewhere (firewall, config, password,
anything) and ther ewas not "caught up" client at the time, and I've
just lost a days' worth of my $$$$$ transactions...

> Most of the architectural discussion over last few months has been about
> the need for the initial state and how to handle it. Most of the code
> complexity also.

Well, for me, I'm quite happy with a "restart/stop&start" being a
necessary "downtime" to move to synchronous replication.  This way, I
could see a "setup" routing that looks like:
1) Current "production" DB does normal backups/PITR/WAL archiving
2) I setup new "slave", which involves  - restore from backup + wal recover (pg_standby type)  - Could take days+++  -
Ohwell....

3) Stop production
4) so, now slave is caught up...
5) Start "production" now in sync rep mode as master
6) start slave in sync-rep mode as slave...

So downtime would be limited to the time from the old postmaster
shutdown to the time the slave has replayed the last WAL and connected
to the restarted postmaster as a sync rep slave...

Or am I way too naive to think that a small downtime to "switch" from
non-sync-rep to sync-rep is acceptable...

a.
-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

pgsql-hackers by date:

From: Tom Lane
Date: 11 December 2008, 10:25:20
Subject: Re: visibility maps

From: Dmitry Turin
Date: 11 December 2008, 10:32:25
Subject: Re: COCOMO & Indians

Re: Sync Rep: First Thoughts on Code - Mailing list pgsql-hackers

Previous

Next