Re: Switching timeline over streaming replication - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Switching timeline over streaming replication |
Date | |
Msg-id | 50BF9F0D.5040608@vmware.com Whole thread Raw |
In response to | Re: Switching timeline over streaming replication (Amit Kapila <amit.kapila@huawei.com>) |
Responses |
Re: Switching timeline over streaming replication
|
List | pgsql-hackers |
On 05.12.2012 14:32, Amit Kapila wrote: > On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote: >> After some diversions to fix bugs and refactor existing code, I've >> committed a couple of small parts of this patch, which just add some >> sanity checks to notice incorrect PITR scenarios. Here's a new version >> of the main patch based on current HEAD. > > After testing with the new patch, the following problems are observed. > > Defect - 1: > > 1. start primary A > 2. start standby B following A > 3. start cascade standby C following B. > 4. start another standby D following C. > 5. Promote standby B. > 6. After successful time line switch in cascade standby C& D, stop D. > 7. Restart D, Startup is successful and connecting to standby C. > 8. Stop C. > 9. Restart C, startup is failing. Ok, the error I get in that scenario is: C 2012-12-05 19:55:43.840 EET 9283 FATAL: requested timeline 2 does not contain minimum recovery point 0/3023F08 on timeline 1 C 2012-12-05 19:55:43.841 EET 9282 LOG: startup process (PID 9283) exited with exit code 1 C 2012-12-05 19:55:43.841 EET 9282 LOG: aborting startup due to startup process failure It seems that the commits I made to master already: http://archives.postgresql.org/pgsql-committers/2012-12/msg00116.php http://archives.postgresql.org/pgsql-committers/2012-12/msg00111.php were a few bricks shy of a load. The problem is that if recovery stops at a checkpoint record that changes timeline, so that minRecoveryPoint points to the end of the checkpoint record, we still record the old TLI as the TLI of minRecoveryPoint. This is because 1) there's a silly bug in the patch; replayEndTLI is not updated along with replayEndRecPtr. But even after fixing that, we're not good. The problem is that replayEndRecPtr is currently updated *before* calling the redo function. replayEndRecPtr is what becomes minRecoveryPoint when XLogFlush() is called. If the record is a checkpoint record, redoing it will switch recovery to the new timeline, but replayEndTLI will not be updated until the next record. IOW, as far as minRecoveryPoint is concerned, a checkpoint record that switches timeline is considered to be part of the old timeline. But when a server is promoted and a new timeline is created, the checkpoint record is considered to be part of the new timeline; that's what we write in the page header and in the control file. That mismatch causes the error. I'd like to fix this by always treating the checkpoint record to be part of the new timeline. That feels more correct. The most straightforward way to implement that would be to peek at the xlog record before updating replayEndRecPtr and replayEndTLI. If it's a checkpoint record that changes TLI, set replayEndTLI to the new timeline before calling the redo-function. But it's a bit of a modularity violation to peek into the record like that. Or we could just revert the sanity check at beginning of recovery that throws the "requested timeline 2 does not contain minimum recovery point 0/3023F08 on timeline 1" error. The error I added to redo of checkpoint record that says "unexpected timeline ID %u in checkpoint record, before reaching minimum recovery point %X/%X on timeline %u" checks basically the same thing, but at a later stage. However, the way minRecoveryPointTLI is updated still seems wrong to me, so I'd like to fix that. I'm thinking of something like the attached (with some more comments before committing). Thoughts? - Heikki
Attachment
pgsql-hackers by date: