Hot Standby has PANIC: WAL contains references to invalid pages - Mailing list pgsql-general
From | Michael Harris |
---|---|
Subject | Hot Standby has PANIC: WAL contains references to invalid pages |
Date | |
Msg-id | 30BC62DC16C7B842A8446ED8EB2F0439067D1C@ESGSCMB105.ericsson.se Whole thread Raw |
Responses |
Re: Hot Standby has PANIC: WAL contains references to invalid pages
|
List | pgsql-general |
Hi All, We are having a thorny problem I'm hoping someone will be able to help with= . We have a pair of machines set up as an active / hot SB pair. The database = they contain is quite large - approx. 9TB. They were working fine on 9.1, a= nd we recently upgraded the active DB to 9.2.1. After upgrading the active DB, we re-mirrored the standby (using pg_basebac= kup) and started it up. It began replaying the WAL files as expected. After a few hours this happened: WARNING: page 1 of relation pg_tblspc/16408/PG_9.2_201204301/16409/1123460= 086 is uninitialized CONTEXT: xlog redo vacuum: rel 16408/16409/1123460086; blk 4411, lastBlock= Vacuumed 0 PANIC: WAL contains references to invalid pages CONTEXT: xlog redo vacuum: rel 16408/16409/1123460086; blk 4411, lastBlock= Vacuumed 0 LOG: startup process (PID 24195) was terminated by signal 6: Aborted LOG: terminating any other active server processes We tried starting it up again, the same thing happened. After some googling and re-reading the release notes, we noticed the mentio= n in the 9.2.1 release notes about the potential for corrupted visibility m= aps, so as per the recommendation we did a full VACUUM of the whole databas= e (with vacuum_freeze_table_age set to zero), then re-mirrored the standby = again. After re-mirroring was completed we started the standby again. Strangely it= reached consistency after only 33 WAL files - since the base backup took 5= days to complete this does not seem right to me. Anyway, WAL recovery cont= inued, with occasional warnings like this: [2013-02-04 10:30:51 EST] 13546@ WARNING: xlog min recovery request 1A13= A/9BC425A0 is past current point 19F1E/725043E8 [2013-02-04 10:30:51 EST] 13546@ CONTEXT: writing block 0 of relation pg= _tblspc/16408/PG_9.2_201204301/16409/12525_vm After a few hours, this happened: [2013-02-04 13:43:24 EST] 13538@ WARNING: page 1248 of relation pg_tblsp= c/16408/PG_9.2_201204301/16409/1128746393 does not exist [2013-02-04 13:43:24 EST] 13538@ CONTEXT: xlog redo visible: rel 16408/1= 6409/1128746393; blk 1248 [2013-02-04 13:43:24 EST] 13538@ PANIC: WAL contains references to inval= id pages [2013-02-04 13:43:24 EST] 13538@ CONTEXT: xlog redo visible: rel 16408/1= 6409/1128746393; blk 1248 [2013-02-04 13:43:25 EST] 13532@ LOG: startup process (PID 13538) was te= rminated by signal 6: Aborted [2013-02-04 13:43:25 EST] 13532@ LOG: terminating any other active serve= r processes Looks similar to the first case, but a different context. We thought that p= erhaps an index had become corrupted (apparently also a possibility with th= e bug mentioned above) however the file mentioned belongs to a normal table= , not an index. And 'redo visible' sounds like it might be to do with the v= isibility map? We restarted it again with debugging cranked up. It didn't reveal anything = more interesting. We then upgraded the standby to 9.2.2 and started it agai= n. Again no dice. In each case it fails at exactly the same point with the = same error. Any ideas for a next troubleshooting step? Regards // Mike
pgsql-general by date: