Thread: Unable to start postgres in recovery mode.
I am trying to put my database in recovery mode and I get the following error: =========== LOG: starting archive recovery LOG: restore_command = "/myrestore/pg_restore.sh %f %p" 00000001.history pg_xlog/RECOVERYHISTORY [Main]: Server requested for 00000001.history to be copied to pg_xlog/RECOVERYHISTORY [pg_restore::isWALFileReady]: 1 Available WAL Files [Main]: Request to copy 00000001.history to pg_xlog/RECOVERYHISTORY [pg_restore]::copyWALFile: Moving /mybackup/000000010000000000000001.009352E8.backup to pg_xlog/RECOVERYHISTORY LOG: restored log file "00000001.history" from archive FATAL: syntax error in history file: START WAL LOCATION: 0/19352E8 (file 000000010000000000000001) HINT: Expected a numeric timeline ID. LOG: startup process (PID 12323) exited with exit code 1 LOG: aborting startup due to startup process failure =========== In the log above, the logs with [Main] or [pg_restore] is my script which is called by the recovery.conf. The postgres server is asking for 00000001.history file and I do not have that file. All I have is the 0*10*1.009352E8.backup file and other WAL files starting from 0*10*1. In the above case, I move 0*10*1.009352E8.backup pg_xlog/RECOVERYHISTORY. Note that my backup is on a staging area and I can therefore move safely. What am I doing wrong? If I indicate that I do not have the concerned file by returning error code 1, I get the following error in the log: ============ LOG: starting archive recovery LOG: restore_command = "/myrestore/pg_restore.sh %f %p" 00000001.history pg_xlog/RECOVERYHISTORY [Main]: Server requested for 00000001.history to be copied to pg_xlog/RECOVERYHISTORY 00000001000000000000004F pg_xlog/RECOVERYXLOG [Main]: Server requested for 00000001000000000000004F to be copied to pg_xlog/RECOVERYXLOG LOG: could not open file "pg_xlog/00000001000000000000004F" (log file 0, segment 79): No such file or directory LOG: invalid primary checkpoint record 00000001000000000000004F pg_xlog/RECOVERYXLOG [Main]: Server requested for 00000001000000000000004F to be copied to pg_xlog/RECOVERYXLOG LOG: could not open file "pg_xlog/00000001000000000000004F" (log file 0, segment 79): No such file or directory LOG: invalid secondary checkpoint record PANIC: could not locate a valid checkpoint record LOG: startup process (PID 12222) was terminated by signal 6 LOG: aborting startup due to startup process failure LOG: database system was shut down at 2007-03-19 03:33:05 PDT ============== So what am I doing wrong here? Any help in the above matter is greatly appreciated. Regards Dhaval
"Dhaval Shah" <dhaval.shah.m@gmail.com> writes: > What am I doing wrong? Lying to the server. If you don't have the requested file, return failure, don't invent something. There are a number of cases where the recovery process asks for files that are quite likely not to exist. > If I indicate that I do not have the concerned file by returning error > code 1, I get the following error in the log: This may indicate that you have an incomplete backup :-(. It's hard to tell from this much info though. What is in pg_control (use pg_controldata to dump) and what is in the backup_label file (that's plain text)? What WAL segment files do you actually have? regards, tom lane
Thanks for the email. It helped and after going through the email and the doc, I realized that the "backup" file had the wrong information, or rather I had the wrong backup files. That will do the kind of errors I have seen. However, I do have one question, I am setting this up as part of the HA process. The standby is a "hot" standby. Now, if the primary fails how do I tell the secondary that come out of recovery mode and move the recovery.conf to recovery.done and start the db. I mean, what error code shall I return? If I return a non-numeric error code, I get the following result [from serverlog]: ==== 00000001000000000000001B pg_xlog/RECOVERYXLOG LOG: restored log file "00000001000000000000001B" from archive 00000001000000000000001C pg_xlog/RECOVERYXLOG [Main: Triggering Recovery!!!] <---- My script detected that it needs to trigger recovery... LOG: could not open file "pg_xlog/00000001000000000000001C" (log file 0, segment 28): No such file or directory LOG: redo done at 0/1B000070 00000001000000000000001B pg_xlog/RECOVERYXLOG Main: Triggering Recovery!!! <--- My script is called again and the script says trigger recovery PANIC: could not open file "pg_xlog/00000001000000000000001B" (log file 0, segment 27): No such file or directory LOG: startup process (PID 32167) was terminated by signal 6 LOG: aborting startup due to startup process failure ==== This is what my script is doing: if ( triggerRecovery() ) { print "Main: Triggering Recovery!!! \n"; return 1; } So, the question is, on detecting that the primary is down and to trigger recovery, what error code shall I return? Or do I have to move the recovery.conf to recovery.done myself and restart the db? Regards Dhaval On 3/20/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Dhaval Shah" <dhaval.shah.m@gmail.com> writes: > > What am I doing wrong? > > Lying to the server. If you don't have the requested file, return > failure, don't invent something. There are a number of cases where > the recovery process asks for files that are quite likely not to exist. > > > If I indicate that I do not have the concerned file by returning error > > code 1, I get the following error in the log: > > This may indicate that you have an incomplete backup :-(. It's hard to > tell from this much info though. What is in pg_control (use > pg_controldata to dump) and what is in the backup_label file (that's > plain text)? What WAL segment files do you actually have? > > regards, tom lane > -- Dhaval Shah