Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL |
Date | |
Msg-id | 4BAB37B3.109@enterprisedb.com Whole thread Raw |
In response to | Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Re: [COMMITTERS] pgsql: Make standby server
continuously retry restoring the next WAL
|
List | pgsql-hackers |
(cc'ing docs list) Simon Riggs wrote: > The lack of docs begins to show a lack of coherent high-level design > here. Yeah, I think you're right. It's becoming hard to keep track of how it's supposed to behave. > By now, I've forgotten what this thread was even about. The major > design decision in this that keeps showing up is "remove pg_standby, at > all costs" but no reason has ever been given for that. I do believe > there is a "better way", but we won't find it by trial and error, even > if we had time to do so. This has nothing to do with pg_standby. > Please work on some clear docs for the failure modes in this system. > That way we can all read them and understand them, or point out further > issues. Moving straight to code is not a solution to this, since what we > need now is to all agree on the way forwards. If we ignore this, then > there is considerable risk that streaming rep will have a fatal > operational flaw. > > Please just document/diagram how it works now, highlighting the problems > that still remain to be solved. We're all behind you and I'm helping > wherever I can. Ok, here's my attempt at the docs. Read it as a replacement for the "High Availability, Load Balancing, and Replication" chapter, but of course many of the sections will be unchanged, as indicated below. ------------- Chapter 25. High Availability, Load Balancing, and Replication 25.1 Comparison of different solutions <no changes> 25.2 Log-Shipping Standby servers <overview from current "File-based Log Shipping" section. With small changes so that it applies to the built-in standby mode as well as pg_standby like solutions> A standby server can also be used for read-only queries. This is called Hot Standby mode, see chapter XXX 25.2.1 Planning Set up two servers with identical hardware ... <two first paragraphs of current File-based log-shipping / Planning section> 25.2.3 Standby mode In standby mode, the server continously applies WAL received from the master server. The standby server can receive WAL from a WAL archive (see restore_command) or directly from the master over a TCP connection (streaming replication). The standby server will also attempt to restore any WAL found in the standby's pg_xlog. That typically happens after a server restart, to replay again any WAL streamed from the master before the restart, but you can also manually copy files to pg_xlog at any time to have them replayed. At startup, the standby begins by restoring all WAL available in the archive location, calling restore_command. Once it reaches the end of WAL available there and restore_command fails, it tries to restore any WAL available in the pg_xlog directory (possibly stored there by streaming replication before restart). If that fails, and streaming replication has been configured, the standby tries to connect to the master server and stream WAL from it. If that fails or streaming replication is not configured, or if the connection is disconnected later on, the standby goes back to step 1 and tries to restoring the file from the archive again. This loop of retries from the archive, pg_xlog, and via streaming replication goes on until the server is stopped or failover is triggered by a trigger file. A corrupt or half-finished WAL file in the archive, or streamed from the master, causes a PANIC and immediate shutdown of the standby server. A corrupt WAL file is always a serious event which requires administrator action. If you want to recover a WAL file known to be corrupt as far as it can be, you can copy the file manually into pg_xlog. Standby mode is exited and the server switches to normal operation, when a trigger file is found (trigger_file). Before failover, it will restore any WAL available in the archive or in pg_xlog, but won't try to connect to the master or wait for files to become available in the archive. 25.2.4 Preparing Master for Standby servers Set up continous archiving to a WAL archive on the master, as described in the chapter "Continous Archiving and Point-In-Time_recovery". The archive location should be accessible from the standby even when the master is down, ie. it should reside on the standby server itself or another trusted server, not on the master server. If you want to use streaming replication, set up authentication to allow streaming replication connections. Set max_wal_senders. Take a base backup as described in chapter Continous Archiving and Point-In-Time_recovery / Making a Base Backup. 25.2.4.1 Authentication for streaming replication Ensure that listen_addresses allows connections from the standby server. <current Streaming Replication / Authentication section, describing pg_hba.conf> 25.2.5 Setting up the standby server 1. Take a base backup, and copy it to the standby 2. Create a restore_command to restore files from the WAL archive. 3. Set standby_mode=on 4. If you want to use streaming replicaton, set primary_conninfo You can use restartpoint_command to prune the archive of files no longer needed by the standby. You can have any number of standby servers as long as you set max_wal_senders high enough in the master to allow them to be connected simultaneously. 25.2.6 Archive recovery based log shipping An alternative to the built-in standby mode desribed in the previous sections is to use a restore_command that polls the archive location. This was the only option available in versions 8.4 and below. In this setup, set standby_mode=off, because you are implementing the polling required for a standby server yourself. See contrib/pg_standby for a reference implementation of this. Note that the in this mode, the server will apply WAL one file at a time, so if you use the standby server for queries (see Hot Standby), there is a bigger delay between an action in the master and when the action becomes visible in the standby, corresponding the time it takes to fill up the WAL file. archive_timeout can be used to make that delay shorter. Also note that you can't combine streaming replication with this method. <all the sections from current File-based log shipping that don't apply to built-in standby mode go here> 25.3 Hot Standby <no changes> 25.4 Incrementally Updated backups <no changes> ------------- -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: