Thread: how to switch old replication Master to new Standby after promoting old Standby
how to switch old replication Master to new Standby after promoting old Standby
From
John Lumby
Date:
From: johnlumby@hotmail.com To: pgsql-general-owner@postgresql.org Subject: how to switch old replication Master to new Standby after promoting old Standby Date: Mon, 14 Mar 2016 13:23:29 -0400 In a scenario involving replication where no failure occurs but I want to interchange Master <->Standby, *and* want to avoid making another full base backup and rsyncing it across, I have found it easy to promote old Standby to new Master with pg_ctl promote, but very difficult to restart the old Primary as a new Standby *without* performing new base backup. Assume current 9.5.1 and using streaming replication with a named replication slot if relevant Second assumption - I am able to temporarily prevent any relational updates to the database before I start the switchover e.g. the default_transaction_read_only = on setting together with being able to control what transactions do if anything So I reach a point where both systems have postgresql running without any replication, both have identical content in all databases (that I can control), and I am willing to tolerate short restarts if need be and also to scp/rsync the contents of pg_xlog and other small files but *not* the entire cluster directory or any database base directories. What do I do next? Here is what I have found seems to work but I am not sure it is robust: 1. shut down both new Master and intended-to-be-new-Standby 2. on intended-to-be-new-Standby, remove the entire content of pg_xlog and the global/pg_control 3. from new Master , tar + scp the entire content of pg_xlog and the global/pg_control to intended-to-be-new-Standby 4. create intended-to-be-new-Standby's recovery.conf, specifying recovery_target_timeline = 'latest' (but I think it works with this setting omitted) 5. start new Master 6. start new Standby and up it comes in its postgres log(**) I see LOG: 00000: database system was shut down at 2016-03-14 16:41:24 GMT LOCATION: StartupXLOG, xlog.c:5936 LOG: 00000: entering standby mode LOCATION: StartupXLOG, xlog.c:6016 LOG: 00000: consistent recovery state reached at C/480000D0 LOCATION: CheckRecoveryConsistency, xlog.c:7493 LOG: 00000: invalid record length at C/480000D0 LOCATION: ReadRecord, xlog.c:3960 LOG: 00000: started streaming WAL from primary at C/48000000 on timeline 3 LOCATION: WalReceiverMain, walreceiver.c:358 LOG: 00000: redo starts at C/480000D0 LOCATION: StartupXLOG, xlog.c:6700 Is the invalid record length msg anything to worry about? But this method is purely empirical. Is it robust? Anyone have any better recommendations? Cheers, John Lumby (**Note this log was from a pre-release 9.5, 9.5alpha2 I don't have 9.5.1 to hand at present
Re: how to switch old replication Master to new Standby after promoting old Standby
From
"Shulgin, Oleksandr"
Date:
On Mon, Mar 14, 2016 at 6:28 PM, John Lumby <johnlumby@hotmail.com> wrote:
From: johnlumby@hotmail.com
To: pgsql-general-owner@postgresql.org
Subject: how to switch old replication Master to new Standby after promoting old Standby
Date: Mon, 14 Mar 2016 13:23:29 -0400
In a scenario involving replication where no failure occurs but I want to interchange Master <->Standby,
*and* want to avoid making another full base backup and rsyncing it across,
I have found it easy to promote old Standby to new Master with pg_ctl promote,
but very difficult to restart the old Primary as a new Standby *without* performing new base backup.
Assume current 9.5.1 and using streaming replication with a named replication slot if relevant
Second assumption - I am able to temporarily prevent any relational updates to the database before I start the switchover
e.g. the
default_transaction_read_only = on
setting together with being able to control what transactions do if anything
So I reach a point where both systems have postgresql running without any replication,
both have identical content in all databases (that I can control),
and I am willing to tolerate short restarts if need be and also to scp/rsync the contents of pg_xlog
and other small files but *not* the entire cluster directory or any database base directories.
What do I do next?
Here is what I have found seems to work but I am not sure it is robust:
1. shut down both new Master and intended-to-be-new-Standby
2. on intended-to-be-new-Standby, remove the entire content of pg_xlog and the global/pg_control
3. from new Master , tar + scp the entire content of pg_xlog and the global/pg_control to intended-to-be-new-Standby
That does seem like a very risky strategy to me. Have you taken a look at pg_rewind (which is now part of the distribution)?
--
Alex
Re: how to switch old replication Master to new Standby after promoting old Standby
From
John Lumby
Date:
Many thanks Alex, I had not noticed pg_rewind before. The name pg_rewind is not exactly very descriptive of what this utility does. But it is exactly what I need, thank-you. I tried it an eventually got it to work but did hit one strange problem - I ran it like this pg_rewind -D /mnt/bluebild/pgrepl95 --source-server=host=10.19.0.21 port=5432 user=postgres password=postgres -P --debug after which my postgresql server would not start and I discovered that the rewind had actually copied the source's postgresql.conf, replacing the one on the target (i.e. the old primary). And indeed in its debug I found received chunk for file "postgresql.conf", offset 0, size 16482 received chunk for file "postgresql.conf.20160314114055", offset 0, size 16464 And I now see in its description in the Doc that it intends to do this. But why would it do that? Maybe a note about it should be added to the wiki https://wiki.postgresql.org/wiki/Streaming_Replication (not sure if I can) Cheers, and thanks, John ________________________________ > From: oleksandr.shulgin@zalando.de > Date: Mon, 14 Mar 2016 18:54:11 +0100 > Subject: Re: [GENERAL] how to switch old replication Master to new > Standby after promoting old Standby > To: johnlumby@hotmail.com > CC: pgsql-general@postgresql.org > > On Mon, Mar 14, 2016 at 6:28 PM, John Lumby > <johnlumby@hotmail.com<mailto:johnlumby@hotmail.com>> wrote: > > From: johnlumby@hotmail.com<mailto:johnlumby@hotmail.com> > To: > pgsql-general-owner@postgresql.org<mailto:pgsql-general-owner@postgresql.org> > Subject: how to switch old replication Master to new Standby after > promoting old Standby > Date: Mon, 14 Mar 2016 13:23:29 -0400 > > In a scenario involving replication where no failure occurs but I want > to interchange Master <->Standby, > *and* want to avoid making another full base backup and rsyncing it across, > I have found it easy to promote old Standby to new Master with pg_ctl > promote, > but very difficult to restart the old Primary as a new Standby > *without* performing new base backup. > > Assume current 9.5.1 and using streaming replication with a named > replication slot if relevant > > Second assumption - I am able to temporarily prevent any relational > updates to the database before I start the switchover > e.g. the > default_transaction_read_only = on > setting together with being able to control what transactions do if anything > > So I reach a point where both systems have postgresql running without > any replication, > both have identical content in all databases (that I can control), > and I am willing to tolerate short restarts if need be and also to > scp/rsync the contents of pg_xlog > and other small files but *not* the entire cluster directory or any > database base directories. > > What do I do next? > > Here is what I have found seems to work but I am not sure it is robust: > > 1. shut down both new Master and intended-to-be-new-Standby > 2. on intended-to-be-new-Standby, remove the entire content of pg_xlog > and the global/pg_control > 3. from new Master , tar + scp the entire content of pg_xlog and the > global/pg_control to intended-to-be-new-Standby > > That does seem like a very risky strategy to me. Have you taken a look > at pg_rewind (which is now part of the distribution)? > > -- > Alex >
Re: how to switch old replication Master to new Standby after promoting old Standby
From
Michael Paquier
Date:
On Mon, Mar 14, 2016 at 6:54 PM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote: > On Mon, Mar 14, 2016 at 6:28 PM, John Lumby <johnlumby@hotmail.com> wrote: >> 1. shut down both new Master and intended-to-be-new-Standby >> 2. on intended-to-be-new-Standby, remove the entire content of pg_xlog and >> the global/pg_control >> 3. from new Master , tar + scp the entire content of pg_xlog and the >> global/pg_control to intended-to-be-new-Standby This is not robust and will corrupt your standby. Just take the case of a relation data block modified on the to-be-new standby, and not replayed since the last checkpoint before WAL forked: data will be corrupted. Inconsistent pg_clog will likely break things. > That does seem like a very risky strategy to me. Have you taken a look at > pg_rewind (which is now part of the distribution)? pg_rewind has been designed for that, and ensures that the soon-to-be-standby has a minimum recovery target sufficient. You had better use it. -- Michael
Re: Re: how to switch old replication Master to new Standby after promoting old Standby
From
Michael Paquier
Date:
On Mon, Mar 14, 2016 at 11:08 PM, John Lumby <johnlumby@hotmail.com> wrote: > And indeed in its debug I found > received chunk for file "postgresql.conf", offset 0, size 16482 > received chunk for file "postgresql.conf.20160314114055", offset 0, size 16464 > > And I now see in its description in the Doc that it intends to do this. > But why would it do that? To make its code more simple. This way there is no need to apply any kind of file-based filters to decide if some files should be copied or not, and it is not that much a big deal to copy the configuration files of the target node before performing the rewind. > Maybe a note about it should be added to the wiki > https://wiki.postgresql.org/wiki/Streaming_Replication > (not sure if I can) With a community account you could edit this page. -- Michael
Re: how to switch old replication Master to new Standby after promoting old Standby
From
John Lumby
Date:
Thank you both for the advice. pg_rewind is a nice utility and not only more robust than what I came up with but also easier to use and avoids need to shut down new Primary. Re editing the wiki, I do have a community account but it seems I need more than that : ==> Editing this wiki now requires "editor" privileges. If anyone who has such privileges would like to edit the page, here is what I would add to it -- feel free to edit/rewrite after the bullet . How to restart streaming replication after failover and before the sub-bullet Repeat the operations from 6th; add this: Starting with the old Standby now running as Unreplicated and the old Primary shut down but servicable, with its databases intact, the task is to put the old Primary into Standby mode as rapidly and unintrusively as possible. This implies not requiring to shut new Primary down and not requiring to make another full base backup. A utility named pg_rewind makes this much simpler and more robust, and it is included in standard postgresql distribution since 9.5. - it is documented under PostgreSQL Server Applications. To use pg_rewind : First and most important, it is essential to have *previously* set the configuration parameter wal_log_hints = on in both the old Primary and old Standby, *before* the failover. An alternative is described in the documentation but setting this parameter is simpler. If you did not set this or the alternative, then , set it for future, and don't use pg_rewind this time. See next. Secondly , note that pg_rewind will potentially update *every* file in the old Primary cluster, including configuration files. It is likely that configuration files may not match exactly on the two systems, so make a copy of postgresql.conf and postgresql.auto.conf for later restore. Thirdly, double-check that old Primary is shut down. Now run pg_rewind on old Primary using the form pg_rewind -D ${pg_cluster_dir} \ --source-server="host=${source_server_ip} port=${source_server_port} user=${replication_user} password=${replication_password}"\ -P You can add --debug if you want a blow-by-blow account of every change it makes. Now restore your configuration files, first perhaps comparing what differences there were. Finally, create the recovery.conf for the new Standby You can now start the new Standby. There are some limitations with pg_rewind described in documentation. If you could not use it or it failed, then treat your old Primary as an empty cluster and commission it from the start as described next Cheers, John Lumby ---------------------------------------- > Date: Mon, 14 Mar 2016 23:46:28 +0100 > Subject: Re: [GENERAL] Re: how to switch old replication Master to new Standby after promoting old Standby > From: michael.paquier@gmail.com > To: johnlumby@hotmail.com > CC: oleksandr.shulgin@zalando.de; pgsql-general@postgresql.org > > On Mon, Mar 14, 2016 at 11:08 PM, John Lumby <johnlumby@hotmail.com> wrote: >> And indeed in its debug I found >> received chunk for file "postgresql.conf", offset 0, size 16482 >> received chunk for file "postgresql.conf.20160314114055", offset 0, size 16464 >> >> And I now see in its description in the Doc that it intends to do this. >> But why would it do that? > > To make its code more simple. This way there is no need to apply any > kind of file-based filters to decide if some files should be copied or > not, and it is not that much a big deal to copy the configuration > files of the target node before performing the rewind. > >> Maybe a note about it should be added to the wiki >> https://wiki.postgresql.org/wiki/Streaming_Replication >> (not sure if I can) > > With a community account you could edit this page. > -- > Michael
Re: Re: how to switch old replication Master to new Standby after promoting old Standby
From
Adrian Klaver
Date:
On 03/15/2016 06:54 AM, John Lumby wrote: > Thank you both for the advice. > pg_rewind is a nice utility and not only more robust than what I came up with > but also easier to use and avoids need to shut down new Primary. > > Re editing the wiki, I do have a community account but it seems I need more than that : > > ==> Editing this wiki now requires "editor" privileges. https://wiki.postgresql.org/wiki/WikiEditing "Editing this wiki now requires "editor" privileges. If you just created a new community account or if your current account used to have "editor" privileges, you can ask on either the PostgreSQL -www Mailinglist or the PostgreSQL IRC Channel for "editor" privileges. Please include your community account name in those requests. This is due, in large part, to recent spam activity. " > > If anyone who has such privileges would like to edit the page, > here is what I would add to it -- feel free to edit/rewrite > > after the bullet > > . How to restart streaming replication after failover > > and before the sub-bullet > > Repeat the operations from 6th; > > add this: > > Starting with the old Standby now running as Unreplicated and the old Primary shut down but servicable, > with its databases intact, the task is to put the old Primary into Standby mode > as rapidly and unintrusively as possible. > This implies not requiring to shut new Primary down and not requiring to make another full base backup. > A utility named pg_rewind makes this much simpler and more robust, and it is included in standard > postgresql distribution since 9.5. - it is documented under PostgreSQL Server Applications. > > To use pg_rewind : > First and most important, it is essential to have *previously* set the configuration parameter > wal_log_hints = on > in both the old Primary and old Standby, *before* the failover. > An alternative is described in the documentation but setting this parameter is simpler. > If you did not set this or the alternative, then , set it for future, > and don't use pg_rewind this time. See next. > Secondly , note that pg_rewind will potentially update *every* file in the old Primary cluster, > including configuration files. It is likely that configuration files may not match exactly on the two systems, > so make a copy of postgresql.conf and postgresql.auto.conf for later restore. > Thirdly, double-check that old Primary is shut down. > Now run pg_rewind on old Primary using the form > > pg_rewind -D ${pg_cluster_dir} \ > --source-server="host=${source_server_ip} port=${source_server_port} user=${replication_user} password=${replication_password}"\ > -P > > You can add --debug if you want a blow-by-blow account of every change it makes. > > Now restore your configuration files, first perhaps comparing what differences there were. > Finally, create the recovery.conf for the new Standby > > You can now start the new Standby. > > There are some limitations with pg_rewind described in documentation. > If you could not use it or it failed, then treat your old Primary as an empty cluster > and commission it from the start as described next > > > Cheers, John Lumby > ---------------------------------------- >> Date: Mon, 14 Mar 2016 23:46:28 +0100 >> Subject: Re: [GENERAL] Re: how to switch old replication Master to new Standby after promoting old Standby >> From: michael.paquier@gmail.com >> To: johnlumby@hotmail.com >> CC: oleksandr.shulgin@zalando.de; pgsql-general@postgresql.org >> >> On Mon, Mar 14, 2016 at 11:08 PM, John Lumby <johnlumby@hotmail.com> wrote: >>> And indeed in its debug I found >>> received chunk for file "postgresql.conf", offset 0, size 16482 >>> received chunk for file "postgresql.conf.20160314114055", offset 0, size 16464 >>> >>> And I now see in its description in the Doc that it intends to do this. >>> But why would it do that? >> >> To make its code more simple. This way there is no need to apply any >> kind of file-based filters to decide if some files should be copied or >> not, and it is not that much a big deal to copy the configuration >> files of the target node before performing the rewind. >> >>> Maybe a note about it should be added to the wiki >>> https://wiki.postgresql.org/wiki/Streaming_Replication >>> (not sure if I can) >> >> With a community account you could edit this page. >> -- >> Michael > > -- Adrian Klaver adrian.klaver@aklaver.com
Re: how to switch old replication Master to new Standby after promoting old Standby - pg_rewind log file missing
From
John Lumby
Date:
I have already hit a different problem with pg_rewind and would like to check my understanding. The problem is, on 9.5.1 , pg_rewind fails to find a log file that it thinks it needs : pg_rewind -D ${pg_cluster_dir} --source-server='host=10.19.0.1 port=5432 xxx' -P --debug _____________________________________________________ connected to server fetched file "global/pg_control", length 8192 fetched file "pg_xlog/00000003.history", length 85 servers diverged at WAL position 0/2B000060 on timeline 2 could not open file "/mnt/septcomn/sysbuild/pgrepl95/pg_xlog/00000002000000000000002A": No such file or directory could not find previous WAL record at 0/2A0034C8 Failure, exiting ______________________________________________________ Well, my postgresql servers have archive off (I don't need archiving, only replication) and that particular file, 00000002000000000000002A , is no longer present on either new Primary or this old Primary. so it has gone for good. But my question is, given that the divergence point was 2B000060, why is it looking for a file earlier than that? Ah well ... By the way, this scenario was on a system which started off life as Standby (running ok), was promoted to new Primary (ok) while the original Primary was recycled as new Standby (ok), so one complete failover completed ok, and just now I tried to flip them back to original roles - current Standby promoted (ok) but now hit failure trying to recycle current Primary back to Standby. I don't know if this is significant - i.e. my second question is, is it more likely that a needed(?) log file goes missing when performing the second failover in a sequence of flips? pg_controldata output if relevant : from current just-promoted Primary pg_control version number: 942 Catalog version number: 201510051 Database system identifier: 6261999204763226337 Database cluster state: in production pg_control last modified: Tue 15 Mar 2016 11:43:26 AM EDT Latest checkpoint location: 0/33000098 Prior checkpoint location: 0/32000098 Latest checkpoint's REDO location: 0/33000060 Latest checkpoint's REDO WAL file: 000000030000000000000033 Latest checkpoint's TimeLineID: 3 Latest checkpoint's PrevTimeLineID: 3 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/1624 Latest checkpoint's NextOID: 17820 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 616 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 1624 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Latest checkpoint's oldestCommitTsXid:0 Latest checkpoint's newestCommitTsXid:0 Time of latest checkpoint: Tue 15 Mar 2016 11:43:26 AM EDT Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no wal_level setting: hot_standby wal_log_hints setting: on max_connections setting: 100 max_worker_processes setting: 8 max_prepared_xacts setting: 0 max_locks_per_xact setting: 64 track_commit_timestamp setting: off Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Size of a large-object chunk: 2048 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 from previous Primary I am trying and failing to rewind : pg_control version number: 942 Catalog version number: 201510051 Database system identifier: 6261999204763226337 Database cluster state: shut down pg_control last modified: Tue 15 Mar 2016 11:34:36 AM EDT Latest checkpoint location: 0/32000108 Prior checkpoint location: 0/32000098 Latest checkpoint's REDO location: 0/32000108 Latest checkpoint's REDO WAL file: 000000020000000000000032 Latest checkpoint's TimeLineID: 2 Latest checkpoint's PrevTimeLineID: 2 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/1624 Latest checkpoint's NextOID: 17820 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 616 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 0 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Latest checkpoint's oldestCommitTsXid:0 Latest checkpoint's newestCommitTsXid:0 Time of latest checkpoint: Tue 15 Mar 2016 11:34:36 AM EDT Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no wal_level setting: hot_standby wal_log_hints setting: on max_connections setting: 100 max_worker_processes setting: 8 max_prepared_xacts setting: 0 max_locks_per_xact setting: 64 track_commit_timestamp setting: off Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Size of a large-object chunk: 2048 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 Cheers, John ---------------------------------------- > Subject: Re: [GENERAL] Re: how to switch old replication Master to new Standby after promoting old Standby > To: johnlumby@hotmail.com; michael.paquier@gmail.com > CC: oleksandr.shulgin@zalando.de; pgsql-general@postgresql.org > From: adrian.klaver@aklaver.com > Date: Tue, 15 Mar 2016 07:05:43 -0700 > > On 03/15/2016 06:54 AM, John Lumby wrote: >> >> If anyone who has such privileges would like to edit the page, >> here is what I would add to it -- feel free to edit/rewrite >> >> after the bullet >> >> . How to restart streaming replication after failover >> >> and before the sub-bullet >> >> Repeat the operations from 6th; >> >> add this: >> >> Starting with the old Standby now running as Unreplicated and the old Primary shut down but servicable, >> with its databases intact, the task is to put the old Primary into Standby mode >> as rapidly and unintrusively as possible. >> This implies not requiring to shut new Primary down and not requiring to make another full base backup. >> A utility named pg_rewind makes this much simpler and more robust, and it is included in standard >> postgresql distribution since 9.5. - it is documented under PostgreSQL Server Applications. >> >> To use pg_rewind : >> First and most important, it is essential to have *previously* set the configuration parameter >> wal_log_hints = on >> in both the old Primary and old Standby, *before* the failover. >> An alternative is described in the documentation but setting this parameter is simpler. >> If you did not set this or the alternative, then , set it for future, >> and don't use pg_rewind this time. See next. >> Secondly , note that pg_rewind will potentially update *every* file in the old Primary cluster, >> including configuration files. It is likely that configuration files may not match exactly on the two systems, >> so make a copy of postgresql.conf and postgresql.auto.conf for later restore. >> Thirdly, double-check that old Primary is shut down. >> Now run pg_rewind on old Primary using the form >> >> pg_rewind -D ${pg_cluster_dir} \ >> --source-server="host=${source_server_ip} port=${source_server_port} user=${replication_user} password=${replication_password}"\ >> -P >> >> You can add --debug if you want a blow-by-blow account of every change it makes. >> >> Now restore your configuration files, first perhaps comparing what differences there were. >> Finally, create the recovery.conf for the new Standby >> >> You can now start the new Standby. >> >> There are some limitations with pg_rewind described in documentation. >> If you could not use it or it failed, then treat your old Primary as an empty cluster >> and commission it from the start as described next >> >> >> Cheers, John Lumby
Re: how to switch old replication Master to new Standby after promoting old Standby - pg_rewind log file missing
From
Michael Paquier
Date:
On Tue, Mar 15, 2016 at 5:14 PM, John Lumby <johnlumby@hotmail.com> wrote: > But my question is, given that the divergence point was 2B000060, > why is it looking for a file earlier than that? (please do not top-post, this is annoying as it breaks the logic of the thread) pg_rewind begins scanning WAL records not at the point where WAL forked, but from the previous checkpoint record before WAL forked. -- Michael
Re: how to switch old replication Master to new Standby after promoting old Standby - pg_rewind log file missing
From
John Lumby
Date:
> Date: Tue, 15 Mar 2016 17:21:50 +0100 > Subject: Re: how to switch old replication Master to new Standby after promoting old Standby - pg_rewind log file missing > From: michael.paquier@gmail.com > To: johnlumby@hotmail.com > CC: pgsql-general@postgresql.org; oleksandr.shulgin@zalando.de; adrian.klaver@aklaver.com > > On Tue, Mar 15, 2016 at 5:14 PM, John Lumby <johnlumby@hotmail.com> wrote: >> But my question is, given that the divergence point was 2B000060, >> why is it looking for a file earlier than that? > > (please do not top-post, this is annoying as it breaks the logic of the thread) sorry, ok > pg_rewind begins scanning WAL records not at the point where WAL > forked, but from the previous checkpoint record before WAL forked. In that case, I wonder if it would be useful to extend the replication_slots concept to be able to specify to retain all WALs back to "previous checkpoint before most recent timeline switch" in order to guarantee that pg_rewind shall always be able to find the ones it needs in source without requiring archiving and archive-retrieval? > -- > Michael