Re: [HACKERS] patch proposal - Mailing list pgsql-hackers
From | David Steele |
---|---|
Subject | Re: [HACKERS] patch proposal |
Date | |
Msg-id | f87fa0a5-d721-a883-3e00-a5d44db7547b@pgmasters.net Whole thread Raw |
In response to | Re: [HACKERS] patch proposal (Venkata B Nagothi <nag1010@gmail.com>) |
Responses |
Re: patch proposal
|
List | pgsql-hackers |
On 3/21/17 8:45 PM, Venkata B Nagothi wrote: > On Tue, Mar 21, 2017 at 8:46 AM, David Steele <david@pgmasters.net > > Unfortunately, I don't think the first patch (recoveryStartPoint) > will work as currently implemented. The problem I see is that the > new function recoveryStartsHere() depends on pg_control containing a > checkpoint right at the end of the backup. There's no guarantee > that this is true, even if pg_control is copied last. That means a > time, lsn, or xid that occurs somewhere in the middle of the backup > can be selected without complaint from this code depending on timing. > > > Yes, that is true. The latest best position, checkpoint position, xid > and timestamp of the restored backup of the backup is shown up in the > pg_controldata, which means, that is the position from which the > recovery would start. Backup recovery starts from the checkpoint in the backup_label, not from the checkpoint in pg_control. The original checkpoint that started the backup is generally overwritten in pg_control by the end of the backup. > Which in-turn means, WALs start getting replayed > from that position towards --> minimum recovery position (which is the > end backup, which again means applying WALs generated between start and > the end backup) all the way through to --> recovery target position. minRecoveryPoint is only used when recovering a backup that was made from a standby. For backups taken on the master, the backup end WAL record is used. > The best start position to check with would the position shown up in the > pg_control file, which is way much better compared to the current > postgresql behaviour. I don't agree, for the reasons given previously. > The tests pass (or rather fail as expected) because they are written > using values before the start of the backup. It's actually the end > of the backup (where the database becomes consistent on recovery) > that defines where PITR can start and this distinction definitely > matters for very long backups. A better test would be to start the > backup, get a time/lsn/xid after writing some data, and then make > sure that time/lsn/xid is flagged as invalid on recovery. > > The current behaviour is that, if the XID shown-up by the pg_controldata > is say 1400 and the specified recovery_target_xid is say 200, then, > postgresql just completes the recovery and promotes at the immediate > consistent recovery target, which means, the DBA has to all the way > start the restoration and recovery process again to promote the database > at the intended position. I'm not saying that the current behavior is good, only that the proposed behavior is incorrect as far as I can tell. > It is also problematic to assume that transaction IDs commit in > order. If checkPoint.latestCompletedXid contains 5 then a recovery > to xid 4 may or may not be successful depending on the commit > order. However, it appears in this case the patch would disallow > recovery to 4. > > If the txid 4 is the recovery target xid, then, the appropriate backup > taken previous to txid 4 must be used or as an alternative recovery > target like recovery_target_timestamp must be used to proceed to the > respective recovery target xid. I'm not sure I follow you here, but I do know that using the last committed xid says little about which xids precede or follow it. > I think this logic would need to be baked into recoveryStopsAfter() > and/or recoveryStopsBefore() in order to work. It's not clear to me > what that would look like, however, or if the xid check is even > practical. > > > The above two functions would determine if the recovery process has to > stop at a particular position or not and the patch has nothing to do > with it. > > To summarise, this patch only determines if the WAL replay has to start > at all. In other words, if the specified recovery target falls prior to > the restored database position, then, the WAL replay should not start, > which prevent the database from getting promoted at an unintended > recovery target. I understand what you are trying to do. My point is that the information you are working from (whatever checkpoint happens to be in pg_control when recovery starts) is not sufficient for the task. This method is inaccurate and would disallow valid recovery scenarios. I suggest doing a thorough read-through of StartupXLOG(), particularly the section where the backup_label is loaded. Thanks, -- -David david@pgmasters.net
pgsql-hackers by date: