pitr replica dies on startup - Mailing list pgsql-admin
From | Jeff Frost |
---|---|
Subject | pitr replica dies on startup |
Date | |
Msg-id | Pine.LNX.4.64.0708311523260.11463@discord.home.frostconsultingllc.com Whole thread Raw |
Responses |
Re: pitr replica dies on startup
|
List | pgsql-admin |
Hi guys, I've inherited a PITR continuous recovery master/standby server pair. The continuous recovery and loading of the xlogs seems to work fine, but when I opted to test the replica bring up, it falls over with signal 6. Here's an excerpt from the log with log levels set up to debug5: DEBUG: max_safe_fds = 984, usable_fds = 1000, already_open = 6 LOG: database system was interrupted at 2007-08-31 15:09:40 PDT LOG: starting archive recovery LOG: restore_command = "/data/pg/data/recovery.py /data/pg/wal/%f "%p"" DEBUG: executing restore command "/data/pg/data/recovery.py /data/pg/wal/00000001.history "pg_xlog/RECOVERYHISTORY"" DEBUG: could not restore file "00000001.history" from archive: return code 0 DEBUG: executing restore command "/data/pg/data/recovery.py /data/pg/wal/00000001000000000000004C.00000020.backup "pg_xlog/RECOVERYHISTORY"" LOG: restored log file "00000001000000000000004C.00000020.backup" from archive DEBUG: executing restore command "/data/pg/data/recovery.py /data/pg/wal/00000001000000000000004C "pg_xlog/RECOVERYXLOG"" LOG: restored log file "00000001000000000000004C" from archive LOG: checkpoint record is at 0/4C000020 LOG: redo record is at 0/4C000020; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 0/1222; next OID: 24576 LOG: next MultiXactId: 1; next MultiXactOffset: 0 LOG: automatic recovery in progress LOG: redo starts at 0/4C000070 DEBUG: executing restore command "/data/pg/data/recovery.py /data/pg/wal/00000001000000000000004D "pg_xlog/RECOVERYXLOG"" LOG: restored log file "00000001000000000000004D" from archive DEBUG: recovery restart point at 0/4D000020 CONTEXT: xlog redo checkpoint: redo 0/4D000020; undo 0/0; tli 1; xid 0/1232; oid 24576; multi 1; offset 0; online DEBUG: executing restore command "/data/pg/data/recovery.py /data/pg/wal/00000001000000000000004E "pg_xlog/RECOVERYXLOG"" LOG: restored log file "00000001000000000000004E" from archive DEBUG: recovery restart point at 0/4E000020 CONTEXT: xlog redo checkpoint: redo 0/4E000020; undo 0/0; tli 1; xid 0/1242; oid 24576; multi 1; offset 0; online DEBUG: executing restore command "/data/pg/data/recovery.py /data/pg/wal/00000001000000000000004F "pg_xlog/RECOVERYXLOG"" LOG: restored log file "00000001000000000000004F" from archive DEBUG: recovery restart point at 0/4F000020 CONTEXT: xlog redo checkpoint: redo 0/4F000020; undo 0/0; tli 1; xid 0/1252; oid 24576; multi 1; offset 0; online DEBUG: executing restore command "/data/pg/data/recovery.py /data/pg/wal/000000010000000000000050 "pg_xlog/RECOVERYXLOG"" DEBUG: could not restore file "000000010000000000000050" from archive: return code 256 LOG: could not open file "pg_xlog/000000010000000000000050" (log file 0, segment 80): No such file or directory LOG: redo done at 0/4F000070 DEBUG: executing restore command "/data/pg/data/recovery.py /data/pg/wal/00000001000000000000004F "pg_xlog/RECOVERYXLOG"" DEBUG: could not restore file "00000001000000000000004F" from archive: return code 256 PANIC: could not open file "pg_xlog/00000001000000000000004F" (log file 0, segment 79): No such file or directory DEBUG: reaping dead processes LOG: startup process (PID 31211) was terminated by signal 6 LOG: aborting startup due to startup process failure DEBUG: proc_exit(1) DEBUG: shmem_exit(1) DEBUG: exit(1) You can see the rest of the log file here: http://pgsql.privatepaste.com/841mOR5gzm It seems like everything is happy, except it seems to ask for the 4F log file more than once. Directory permissions look good: total 45 -rw------- 1 postgres postgres 4 Aug 31 11:13 PG_VERSION -rw------- 1 postgres postgres 167 Aug 31 15:09 backup_label.old drwx------ 5 postgres postgres 120 Aug 31 14:59 base drwx------ 2 postgres postgres 768 Aug 31 15:08 global drwx------ 2 postgres postgres 72 Aug 31 11:13 pg_clog -rw------- 1 postgres postgres 3279 Aug 31 11:16 pg_hba.conf -rw------- 1 postgres postgres 1460 Aug 31 11:14 pg_ident.conf drwx------ 4 postgres postgres 96 Aug 31 11:13 pg_multixact drwx------ 2 postgres postgres 72 Aug 31 11:13 pg_subtrans drwx------ 2 postgres postgres 48 Aug 31 11:13 pg_tblspc drwx------ 2 postgres postgres 48 Aug 31 11:13 pg_twophase drwx------ 3 postgres postgres 112 Aug 31 15:24 pg_xlog -rw-r--r-- 1 postgres postgres 15370 Aug 31 14:57 postgresql.conf -rw------- 1 postgres postgres 35 Aug 31 15:09 postmaster.opts -rw-r--r-- 1 postgres postgres 67 Aug 31 13:40 recovery.conf -rwxr-xr-x 1 postgres postgres 1384 Aug 31 13:40 recovery.py I verified that the files are really being copied into pg_xlog/RECOVERYXLOG. This is 8.2.4 on Gentoo, but I observed the same behavior with 8.2.3 on Gentoo. We've md5sum'd the postgres binary, diffed the output of pg_config and everything is the same. I tried starting up postgres in a newly initdb'd directory and it starts up fine. I also tried using the conf files from the 'broken' directory in this new directory and it starts up fine, but if I try to replicate into this directory, postgres exits with signal 6 just the same. Conversely, using a default postgresql.conf file in the 'broken' directory leaves it just as broken. The master can start/stop just fine with no problems. What else to look at? I do have a core file, back trace can be found here: http://pgsql.privatepaste.com/8f0WDAAzz9 Any suggestions much appreciated! -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
pgsql-admin by date: