Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages - Mailing list pgsql-bugs
From | Andres Freund |
---|---|
Subject | Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages |
Date | |
Msg-id | 20140113210245.GD14861@awork2.anarazel.de Whole thread Raw |
In response to | Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages |
List | pgsql-bugs |
On 2014-01-13 22:40:32 +0200, Heikki Linnakangas wrote: > On 01/13/2014 10:26 PM, Andres Freund wrote: > >On 2014-01-13 15:19:06 -0500, Tom Lane wrote: > >>I concur that the right fix requires a > >>new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK. > >>I think the spec for this should be that if the page doesn't exist or > >>contains zeroes, we return InvalidBuffer without logging the page number > >>as invalid. The doesn't-exist case is justified by the expectation that > >>there will be a later RBM_NORMAL call for a larger page number, which will > >>result in a suitable complaint if the page range isn't there. > > I think it's more natural to return the empty page to the caller, rather > than InvalidBuffer. Hm. I don't see the advantage - and it makes it impossible to recognize that case at the caller level. > >That's a sensible way to go, yes. But I wonder if we wouldn't end up > >with less complicated code if we added a variant of ReadBuffer that only > >returns a buffer from cache if already present in s_b. > >I started to prototype something like RBM_NORMAL_ZERO_OK shortly after > >Heikki's message and it seems to quickly turn a bit ugly because the > >surrounding code isn't really ready to deal with not returning a > >buffer. And for the purpose of that redo routine, that'd actually be > >better. > > If it's trivial to add such a mode to buffer cache, then sure, let's do that > by all means. But I seriously doubt it's really simple enough to be > back-patchable. Isn't it (untested, written in the editor) just something like: /* only works for pages in shared buffers */ Assert(!SmgrIsTemp(smgr)); /* Make sure we will have room to remember the buffer pin */ ResourceOwnerEnlargeBuffers(CurrentResourceOwner); /* create a tag so we can lookup the buffer */ INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum); newHash = BufTableHashCode(&newTag); newPartitionLock = BufMappingPartitionLock(newHash); /* now, check if the block is in the buffer pool already */ LWLockAcquire(newPartitionLock, LW_SHARED); buf_id = BufTableLookup(&newTag, newHash); if (buf_id >= 0) { /* * Found it. Now, pin the buffer so no one can steal it from the * buffer pool, and check to see if the correct data has been loaded * into the buffer. */ buf = &BufferDescriptors[buf_id]; valid = PinBuffer(buf, NULL); /* Can release the mapping lock as soon as we've pinned it */ LWLockRelease(newPartitionLock); if (!valid) { /* * We can only get here if (a) someone else is still reading in * the page, or (b) a previous read attempt failed. We have to * wait for any active read attempt to finish, and then set up our * own read attempt if the page is still not BM_VALID. * StartBufferIO does it all. */ if (StartBufferIO(buf, true)) { /* * If we get here, previous attempts to read the buffer must * have failed ... */ return InvalidBuffer; } } return buf; } /* * Didn't find it in the buffer pool, done. */ LWLockRelease(newPartitionLock); return InvalidBuffer; Now, that's not trivial, but also not too complicated. > With RBM_NORMAL_ZERO_OK, AFAICS we're talking about a tiny patch to > XLogReadBufferExtended. bufmgr.c doesn't need to do anything about the new > mode, as it's XLogReadBuffer that does the the check for zero pages. Per > attached patch (for demonstration purposes only, you also need to add the > new mode to the header file and adjust comments). I thought about that approach at first as well, but I am not so sure it's sufficient. Isn't it quite possible that we'd end up reading a page that was *partially* written during a crash and due to that has a corrupted checksum? There won't be any protection due to WAL replay/full page writes against that case here. Now, you could argue that that shouldn't be the case because we're only entering that codepath once STANDBY_SNAPSHOT_READY and you might be right... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-bugs by date: