Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog - Mailing list pgsql-bugs
From | Heikki Linnakangas |
---|---|
Subject | Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog |
Date | |
Msg-id | 515FDBBF.8040207@vmware.com Whole thread Raw |
In response to | Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only
looks in pg_xlog
Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog |
List | pgsql-bugs |
On 06.04.2013 01:02, Jeff Janes wrote: > On Fri, Apr 5, 2013 at 12:27 PM,<bohmer@visionlink.org> wrote: >> I use a custom base backup script to call pg_start/stop_backup() and make >> the backup with rsync. >> >> The restore_command in recovery.conf is never called by PG 9.2.4 during >> startup. I confirmed this by adding a "touch /tmp/restore_command.`date >> +%H:%M:%S`" line at the beginning of the shell script I use for my >> restore_command. No such files are created when starting PG 9.2.4. >> >> After downgrading back to 9.2.3, archive recovery works using the very same >> base backup, recovery.conf file, and restore_command. The log indicates >> that >> PG 9.2.3 begins recovery by pulling WAL files from the archive instead of >> pg_xlog: > > I can reproduce the behavior you report only if I remove the "backup_label" > file from the restored data directory before I begin recovery. Of course, > doing that renders the backup invalid, as without it recovery is very > likely to begin from the wrong WAL recovery location. Yeah, if you use pg_start/stop_backup(), there definitely should be a backup_label present. But there is a point here, if you use an atomic filesystem snapshot instead of pg_start/stop_backup(), or just a plain copy of the data directory while the system is shut down. The problem in that case is that if pg_xlog is empty, we have no idea how far we need to recover until the system is consistent. Actually, if the system was shut down, then the system is consistent immediately and we could allow that, but the problem still remains for an online backup using an atomic filesystem snapshot. I don't think there's much we can do about that case. We could start up and recover all the WAL from the archive before we declare consistency, but that gets pretty complicated, and it would still not work if you tried to do that in a standby that uses streaming replication without a restore_command. So, I think what we need to do is to update the documentation to make it clear that you must not zap pg_xlog if you take a backup without pg_start/stop_backup(). The documentation that talks about filesystem snapshots and offline backups doesn't actually say that you can zap pg_xlog - that is only mentioned in the section on pg_start/stop_backup(). But perhaps that could be made more explicit. >> Or, must I now include pg_xlog files when taking base backups with 9.2.4, >> contrary to the documentation? > > You do not need to include pg_xlog, but you do need to include > backup_label. And you always did need to include it--if you were not > including it in the past, then you were playing with fire and is only due > to luck that your database survived. Incidentally, I bumped into another custom backup script just a few weeks back that also excluded backup_label. I don't know what the author was thinking when he wrote that, but it seems to be a surprisingly common mistake. Maybe it's the "label" in the filename that makes people think it's not important. Perhaps we should improve the documentation to make it more explicit that backup_label must be included in the backup. The docs already say that, though, so I suspect that people making this mistake have not read the docs very carefully anyway. Perhaps a comment in the beginning of backup_label would help: # NOTE: This file MUST be included in the backup. Otherwise, the backup # is inconsistent, and restoring it may result in a corrupt database. Jeff B., assuming that you excluded backup_label from the backup for some reason, do you have any thoughts on what would've helped you to avoid that mistake? Would a comment like above have helped - did you look inside backup_label at any point? - Heikki
pgsql-bugs by date: