Recovery bug - Mailing list pgsql-bugs
From | Jeff Davis |
---|---|
Subject | Recovery bug |
Date | |
Msg-id | 1287183483.30681.54.camel@jdavis-ux.asterdata.local Whole thread Raw |
Responses |
Re: Recovery bug
|
List | pgsql-bugs |
This occurs on the master branch, but also pretty much everywhere else. To reproduce: First, it requires a little setup to make sure things happen in the right (wrong?) order. In postgresql.conf I set: archive_mode = on archive_command = 'echo -n archiving: %f... && while [ ! -f /tmp/a ]; do sleep 1; done && rm /tmp/a && echo done' The purpose of the archive command is to control what files get archived and when (they aren't actually archived, but that doesn't matter in this case because I'll never set restore_command). Also, set checkpoint_completion_target high enough (0.5 worked for me). Session1: create table a( i int ); insert into a select generate_series(1,1000000); select pg_start_backup('b1'); -- this command takes a while due -- to checkpoint_completion_target Session2 (quickly before the start_backup() returns): insert into a select generate_series(1,1000000); Now, the last insert should have forced a checkpoint (assuming checkpoint_segments=3), which enables pg_start_backup() to complete. Cat backup_label to make sure that the checkpoint for the backup occurred across two WAL segments (for me it's 00...5 and 00...6). If not, then try again or somehow force the pg_start_backup-checkpoint to be spread out over more other activity. Now, do "touch /tmp/a" to step through the archiving process one file at a time until 00...5 is archived but 00...6 is _not_ archived. Then issue a manual CHECKPOINT. This should result in 00...6 and beyond existing in pg_xlog but nothing before it. Send a SIGQUIT to the postmaster to simulate a crash. When you bring it back up, it thinks it is recovering from a backup, so it reads backup_label. The checkpoint for the backup label is in 00...6, so it reads that just fine. But then it tries to read the WAL starting at the redo location from that checkpoint, which is in 00...5 and it doesn't exist and PANICs. Ordinarily you might say that this is just confusion over whether it's recovering from a backup or not, and you just need to remove backup_label and try again. But that doesn't work: at this point StartupXLOG has already done two things: 1. renamed the backup file to .old 2. updated the control file So the state has been permanently changed such that it thinks the redo location is before the earliest WAL you actually have. I see a control file that looks something like: Latest checkpoint location: 0/6D17110 Prior checkpoint location: 0/8D4DEF0 Latest checkpoint's REDO location: 0/5000020 (and no more backup_label). That makes no sense, of course. If I try to restart the postmaster again, it updates the control file again to something even more surprising: Latest checkpoint location: 0/6D17110 Prior checkpoint location: 0/6D17110 Latest checkpoint's REDO location: 0/5000020 and PANICs again. Starting the postmaster a third time causes another PANIC, of course. There might be an argument that if you crash in the middle of a backup, you should examine the situation before restarting. But there are two big problems with that: 1. a backend crash will cause an automatic restart, corrupting pg_control automatically as well 2. after the restart happens it makes permanent changes that can't be undone (by mere mortals, anyway), removes important information about what actually happened, and generally leaves you with little to work with. I don't have a fix yet, because I think it requires a little discussion. For instance, it seems to be dangerous to assume that we're starting up from a backup with access to the archive when it might have been a crash of the primary system. This is obviously wrong in the case of an automatic restart, or one with no restore_command. Fixing this issue might also remove the annoying "If you are not restoring from a backup, try removing..." PANIC error message. Also, in general we should do more logging during recovery, at least the first stages, indicating what WAL segments it's looking for to get started, why it thinks it needs that segment (from backup or control data), etc. Ideally we would verify that the necessary files exist (at least the initial ones) before making permanent changes. It was pretty painful trying to work backwards on this problem from the final controldata (where checkpoint and prior checkpoint are the same, and redo is before both), a crash, a PANIC, a backup_label.old, and not much else. Regards, Jeff Davis
pgsql-bugs by date: