Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date
Msg-id 52CABF3E.2050004@vmware.com
Whole thread Raw
In response to Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
List pgsql-bugs
On 01/06/2014 03:48 PM, Andres Freund wrote:
> Hi,
>
> On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote:
>> 2013-12-19 20:51:22 MSK 19938 @ from  [vxid:1/0 txid:0] [] WARNING:
>> page 14833 of relation base/16436/3321003988 is uninitialized
>> 2013-12-19 20:51:22 MSK 19938 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
>> lastBlockVacuumed 0
>> 2013-12-19 20:51:22 MSK 19938 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
>> contains references to invalid pages
>> 2013-12-19 20:51:22 MSK 19938 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
>> lastBlockVacuumed 0
>> 2013-12-19 20:51:22 MSK 19935 @ from  [vxid: txid:0] [] LOG:  startup
>> process (PID 19938) was terminated by signal 6: Aborted
>> 2013-12-19 20:51:22 MSK 19935 @ from  [vxid: txid:0] [] LOG:
>> terminating any other active server processes
>
> There just was another case of this reported on IRC by MatheusOl and for
> some reason in his case I noticed the pertinent details and it quickly
> clicked:
> * page 14833 is the one with the error
> * we're actually vacuuming page 38538
> * lastBlockVacuumed is 0
>
> In btree_xlog_vacuum() we scan all the pages between lastBlockVacuumed
> and the page vacuumed and acquire a cleanup lock on it. But there isn't
> any guarantee that the intermediate pages are valid, filled pages,
> afaics.

Hmm. So the problem arises if there's an uninitialized page in the
middle of the b-tree relation for some reason. It's unusual for an
uninitialized page to be left in the middle of the relation, but it's
certainly possible, if e.g you crash just after extending the relation.
In a heap, vacuum will initialize such pages and emit a WARNING like
"page %u is uninitialized --- fixing", but we don't do that for b-tree.

> ISTM we can just use RBM_ZERO_ON_ERROR instead of RBM_NORMAL.

That'd be horrendously dangerous. It would silently zap any page with
any error on it. But we could add a new ReadBufferMode that returns
InvalidBuffer on error, without zeroing the page.

- Heikki

pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Next
From: Heikki Linnakangas
Date:
Subject: Re: BUG #8686: Standby could not restart.