Home > mailing lists

Problem with PITR Past Particular WAL File - Mailing list pgsql-admin

From	Craig McElroy
Subject	Problem with PITR Past Particular WAL File
Date	October 24, 2007 04:20:35
Msg-id	299A6DFE-38B3-443F-A505-40151A4B74F4@contegix.com Whole thread Raw
Responses	Re: Problem with PITR Past Particular WAL File
List	pgsql-admin

Tree view

Greetings:

I am running into a problem during a failover recover of a particular 8.2.4 database running on SunOS 5.10 box. For complete divulgence of information, I am also using the pg_standby utility from the 8.3 contribs to handle the replay of the logs on the standby server.

What I am finding, is that if I only allow it to replay up to a particular WAL file (specifically, 00000001000000180000008A), I am able to trigger the system to change out of recovery mode and it successfully comes online in a bit as expected. I also tried stopping it at each of a few prior WAL files and experienced the same results. Relevant log lines are as follows:

Oct 23 22:40:44 db01b postgres[15894]: [ID 748848 local0.info] [5699-1] LOG: restored log file "000000010000001800000088" from archive
Oct 23 22:40:44 db01b postgres[15894]: [ID 748848 local0.info] [5700-1] LOG: restored log file "000000010000001800000089" from archive
Oct 23 22:40:44 db01b postgres[15894]: [ID 748848 local0.info] [5701-1] LOG: restored log file "00000001000000180000008A" from archive
Oct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info] [5702-1] LOG: could not open file "pg_xlog/00000001000000180000008B" (log file 24, segment 139): No such file or directory
Oct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info] [5703-1] LOG: redo done at 18/8A0C3BC8
Oct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info] [5704-1] LOG: restored log file "00000001000000180000008A" from archive
Oct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info] [5705-1] LOG: archive recovery complete
Oct 23 22:46:16 db01b postgres[15894]: [ID 748848 local0.info] [5706-1] LOG: database system is ready

Now, if I include one more WAL file in the recovery, the additional WAL file appears to be successfully restored, but when triggering the system to come out of recovery mode it fails to fully come online and proceeds to shutdown a few minutes later. I also tried stopping it after each of a few additional WAL files and experienced the same results. Relevant log lines are as follows:

Oct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info] [5699-1] LOG: restored log file "000000010000001800000088" from archive
Oct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info] [5700-1] LOG: restored log file "000000010000001800000089" from archive
Oct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info] [5701-1] LOG: restored log file "00000001000000180000008A" from archive
Oct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info] [5702-1] LOG: restored log file "00000001000000180000008B" from archive
Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info] [5703-1] LOG: could not open file "pg_xlog/00000001000000180000008C" (log file 24, segment 140): No such file or directory
Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info] [5704-1] LOG: redo done at 18/8B2174D0
Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info] [5705-1] LOG: restored log file "00000001000000180000008B" from archive
Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info] [5706-1] LOG: archive recovery complete
Oct 23 22:27:06 db01b postgres[91]: [ID 748848 local0.info] [1-1] LOG: startup process (PID 92) was terminated by signal 11
Oct 23 22:27:06 db01b postgres[91]: [ID 748848 local0.info] [2-1] LOG: aborting startup due to startup process failure

I checked the original server logs around the times that these WAL files were originally archived, but could find no problems being reported. Note that for the sake of absolute consistency, all of my tests were done against a pristine restored base backup.

If any of this doesn't make sense, please let me know and I will do my best to explain myself better. I have been banging my head against this for many hours so it is certainly possible that I may, unbeknownst to myself, be a bit incoherent at this point.

Any suggestions? Thanks.

Cheers,

-craig

---

Craig A. McElroy

Contegix

Beyond Managed Hosting(r) for Your Enterprise

pgsql-admin by date:

From: Vishal Arora
Date: 24 October 2007, 00:56:08
Subject: Re: Postgres SQL Client for Suse Linux

From: Tom Lane
Date: 24 October 2007, 09:13:19
Subject: Re: Problem with PITR Past Particular WAL File

Problem with PITR Past Particular WAL File - Mailing list pgsql-admin

Previous

Next