Re: Switching timeline over streaming replication - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Switching timeline over streaming replication |
Date | |
Msg-id | 50DB5EA9.7010406@vmware.com Whole thread Raw |
In response to | Re: Switching timeline over streaming replication (Fujii Masao <masao.fujii@gmail.com>) |
List | pgsql-hackers |
On 23.12.2012 16:37, Fujii Masao wrote: > On Fri, Dec 21, 2012 at 1:48 AM, Fujii Masao<masao.fujii@gmail.com> wrote: >> On Sat, Dec 15, 2012 at 9:36 AM, Fujii Masao<masao.fujii@gmail.com> wrote: >>> I found another "requested timeline does not contain minimum recovery point" >>> error scenario in HEAD: >>> >>> 1. Set up the master 'M', one standby 'S1', and one cascade standby 'S2'. >>> 2. Shutdown the master 'M' and promote the standby 'S1', and wait for 'S2' >>> to reconnect to 'S1'. >>> 3. Set up new cascade standby 'S3' connecting to 'S2'. >>> Then 'S3' fails to start the recovery because of the following error: >>> >>> FATAL: requested timeline 2 does not contain minimum recovery >>> point 0/3000000 on timeline 1 >>> LOG: startup process (PID 33104) exited with exit code 1 >>> LOG: aborting startup due to startup process failure >>> >>> The result of pg_controldata of 'S3' is: >>> >>> Latest checkpoint location: 0/3000088 >>> Prior checkpoint location: 0/2000060 >>> Latest checkpoint's REDO location: 0/3000088 >>> Latest checkpoint's REDO WAL file: 000000020000000000000003 >>> Latest checkpoint's TimeLineID: 2 >>> <snip> >>> Min recovery ending location: 0/3000000 >>> Min recovery ending loc's timeline: 1 >>> Backup start location: 0/0 >>> Backup end location: 0/0 >>> >>> The content of the timeline history file '00000002.history' is: >>> >>> 1 0/3000088 no recovery target specified >> >> I still could reproduce this problem. Attached is the shell script >> which reproduces the problem. > > This problem happens when new standby starts up from the backup > taken from another standby and its recovery starts from the shutdown > checkpoint record which causes timeline switch. In this case, > the timeline of minimum recovery point can be different from that of > latest checkpoint (i.e., shutdown checkpoint). But the following check > in StartupXLOG() assumes that they are always the same wrongly. > So the problem happens. > > /* > * The min recovery point should be part of the requested timeline's > * history, too. > */ > if (!XLogRecPtrIsInvalid(ControlFile->minRecoveryPoint)&& > tliOfPointInHistory(ControlFile->minRecoveryPoint - 1, expectedTLEs) != > ControlFile->minRecoveryPointTLI) > ereport(FATAL, > (errmsg("requested timeline %u does not contain minimum recovery > point %X/%X on timeline %u", > recoveryTargetTLI, > (uint32) (ControlFile->minRecoveryPoint>> 32), > (uint32) ControlFile->minRecoveryPoint, > ControlFile->minRecoveryPointTLI))); No, it doesn't assume that min recovery point is on the same timeline as the checkpoint record. This is another variant of the "timeline history files are not included in the backup" problem discussed on the other thread with subject "pg_basebackup from cascading standby after timeline switch". If you remove the min recovery point check above, the test case still fails, with a different error message: LOG: unexpected timeline ID 1 in log segment 000000020000000000000003, offset 0 If you modify the test script to copy the 00000002.history file to the data-standby3/pg_xlog after running pg_basebackup, the test case works. (we still need to fix it, of course) - Heikki
pgsql-hackers by date: