BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog - Mailing list pgsql-bugs
From | bohmer@visionlink.org |
---|---|
Subject | BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog |
Date | |
Msg-id | E1UOCIL-0004ll-Qt@wrigleys.postgresql.org Whole thread Raw |
Responses |
Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only
looks in pg_xlog
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 8043 Logged by: Jeff Bohmer Email address: bohmer@visionlink.org PostgreSQL version: 9.2.4 Operating system: CentOS 5.9 x86_64 kernel 2.6.18-348.3.1.el5 Description: = Hi pgsql-bugs, PG version: 9.2.4, from yum.postgresql.org OS: CentOS 5.9 x86_64 kernel 2.6.18-348.3.1.el5 Upgrading from 9.2.3 to 9.2.4 has broken archive recovery for me. Probably related to this 9.2.4 change: Ensure we do crash recovery before entering archive recovery, if the database was not stopped cleanly and a recovery.conf file is present (Heikki Linnakangas, Kyotaro Horiguchi, Mitsumasa Kondo) When starting the PostgreSQL 9.2.4 service on a base backup, I get this: 2013-04-05 12:49:04 MDT [10302]: [1-1] user=3D,db=3D LOG: database system = was interrupted; last known up at 2013-04-05 10:02:01 MDT 2013-04-05 12:49:04 MDT [10302]: [2-1] user=3D,db=3D LOG: starting archive recovery 2013-04-05 12:49:04 MDT [10302]: [3-1] user=3D,db=3D LOG: could not open f= ile "pg_xlog/000000010000002F0000002D" (log file 47, segment 45): No such file or directory 2013-04-05 12:49:04 MDT [10302]: [4-1] user=3D,db=3D LOG: invalid primary checkpoint record 2013-04-05 12:49:04 MDT [10302]: [5-1] user=3D,db=3D LOG: could not open f= ile "pg_xlog/000000010000002F0000002C" (log file 47, segment 44): No such file or directory 2013-04-05 12:49:04 MDT [10302]: [6-1] user=3D,db=3D LOG: invalid secondary checkpoint record 2013-04-05 12:49:04 MDT [10302]: [7-1] user=3D,db=3D PANIC: could not loca= te a valid checkpoint record 2013-04-05 12:49:04 MDT [10297]: [2-1] user=3D,db=3D LOG: startup process = (PID 10302) was terminated by signal 6: Aborted 2013-04-05 12:49:04 MDT [10297]: [3-1] user=3D,db=3D LOG: aborting startup= due to startup process failure The WAL file 000000010000002F0000002D does exist in my WAL archive, but not in pg_xlog. I exclude pg_xlog files when taking the base backup, per the instructions from <http://www.postgresql.org/docs/9.2/static/continuous-archiving.html>: You can, however, omit from the backup dump the files within the cluster's pg_xlog/ subdirectory. This slight adjustment is worthwhile because it reduces the risk of mistakes when restoring. I use a custom base backup script to call pg_start/stop_backup() and make the backup with rsync. The restore_command in recovery.conf is never called by PG 9.2.4 during startup. I confirmed this by adding a "touch /tmp/restore_command.`date +%H:%M:%S`" line at the beginning of the shell script I use for my restore_command. No such files are created when starting PG 9.2.4. After downgrading back to 9.2.3, archive recovery works using the very same base backup, recovery.conf file, and restore_command. The log indicates that PG 9.2.3 begins recovery by pulling WAL files from the archive instead of pg_xlog: 2013-04-05 13:01:14 MDT [16981]: [1-1] user=3D,db=3D LOG: database system = was interrupted; last known up at 2013-04-05 10:02:01 MDT 2013-04-05 13:01:14 MDT [16981]: [2-1] user=3D,db=3D LOG: starting archive recovery 2013-04-05 13:01:14 MDT [16981]: [3-1] user=3D,db=3D LOG: restored log file "000000010000002F0000002D" from archive 2013-04-05 13:01:14 MDT [16981]: [4-1] user=3D,db=3D LOG: consistent recov= ery state reached at 2F/2D000080 2013-04-05 13:01:14 MDT [16981]: [5-1] user=3D,db=3D LOG: redo starts at 2F/2D000080 2013-04-05 13:01:15 MDT [16981]: [6-1] user=3D,db=3D LOG: restored log file "000000010000002F0000002E" from archive 2013-04-05 13:01:15 MDT [16981]: [7-1] user=3D,db=3D LOG: restored log file "000000010000002F0000002F" from archive <snip> 2013-04-05 13:01:17 MDT [16981]: [25-1] user=3D,db=3D LOG: redo done at 2F/3F07B4D0 2013-04-05 13:01:17 MDT [16981]: [26-1] user=3D,db=3D LOG: last completed transaction was at log time 2013-04-05 12:53:01.069826-06 2013-04-05 13:01:17 MDT [16981]: [27-1] user=3D,db=3D LOG: restored log fi= le "000000010000002F0000003F" from archive 2013-04-05 13:01:17 MDT [16981]: [28-1] user=3D,db=3D LOG: selected new timeline ID: 2 2013-04-05 13:01:17 MDT [16981]: [29-1] user=3D,db=3D LOG: archive recovery complete 2013-04-05 13:01:17 MDT [16991]: [1-1] user=3D,db=3D LOG: checkpoint start= ing: end-of-recovery immediate wait 2013-04-05 13:01:17 MDT [16991]: [2-1] user=3D,db=3D LOG: checkpoint compl= ete: wrote 327 buffers (0.1%); 0 transaction log file(s) ad ded, 0 removed, 0 recycled; write=3D0.015 s, sync=3D0.000 s, total=3D0.035 = s; sync files=3D0, longest=3D0.000 s, average=3D0.000 s 2013-04-05 13:01:17 MDT [16978]: [2-1] user=3D,db=3D LOG: database system = is ready to accept connections So, the behavior has definitely changed between 9.2.3 and 9.2.4. Is this a bug in 9.2.4? Or, must I now include pg_xlog files when taking base backups with 9.2.4, contrary to the documentation? Or, is there a way to make PG 9.2.4 use restore_command for recovery, as 9.2.3 does? Thank you, - Jeff
pgsql-bugs by date: