Home > mailing lists

Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages - Mailing list pgsql-bugs

From	Andres Freund
Subject	Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date	January 13, 2014 21:03:12
Msg-id	20140113210245.GD14861@awork2.anarazel.de Whole thread Raw
In response to	Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses	Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
List	pgsql-bugs

Tree view

On 2014-01-13 22:40:32 +0200, Heikki Linnakangas wrote:
> On 01/13/2014 10:26 PM, Andres Freund wrote:
> >On 2014-01-13 15:19:06 -0500, Tom Lane wrote:
> >>I concur that the right fix requires a
> >>new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK.
> >>I think the spec for this should be that if the page doesn't exist or
> >>contains zeroes, we return InvalidBuffer without logging the page number
> >>as invalid.  The doesn't-exist case is justified by the expectation that
> >>there will be a later RBM_NORMAL call for a larger page number, which will
> >>result in a suitable complaint if the page range isn't there.
>
> I think it's more natural to return the empty page to the caller, rather
> than InvalidBuffer.

Hm. I don't see the advantage - and it makes it impossible to recognize
that case at the caller level.

> >That's a sensible way to go, yes. But I wonder if we wouldn't end up
> >with less complicated code if we added a variant of ReadBuffer that only
> >returns a buffer from cache if already present in s_b.
> >I started to prototype something like RBM_NORMAL_ZERO_OK shortly after
> >Heikki's message and it seems to quickly turn a bit ugly because the
> >surrounding code isn't really ready to deal with not returning a
> >buffer. And for the purpose of that redo routine, that'd actually be
> >better.
>
> If it's trivial to add such a mode to buffer cache, then sure, let's do that
> by all means. But I seriously doubt it's really simple enough to be
> back-patchable.

Isn't it (untested, written in the editor) just something like:

    /* only works for pages in shared buffers */
    Assert(!SmgrIsTemp(smgr));

    /* Make sure we will have room to remember the buffer pin */
    ResourceOwnerEnlargeBuffers(CurrentResourceOwner);

    /* create a tag so we can lookup the buffer */
    INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);

    newHash = BufTableHashCode(&newTag);
    newPartitionLock = BufMappingPartitionLock(newHash);

    /* now, check if the block is in the buffer pool already */
    LWLockAcquire(newPartitionLock, LW_SHARED);

    buf_id = BufTableLookup(&newTag, newHash);
    if (buf_id >= 0)
    {
        /*
         * Found it.  Now, pin the buffer so no one can steal it from the
         * buffer pool, and check to see if the correct data has been loaded
         * into the buffer.
         */
        buf = &BufferDescriptors[buf_id];

        valid = PinBuffer(buf, NULL);

        /* Can release the mapping lock as soon as we've pinned it */
        LWLockRelease(newPartitionLock);

        if (!valid)
        {
            /*
             * We can only get here if (a) someone else is still reading in
             * the page, or (b) a previous read attempt failed.  We have to
             * wait for any active read attempt to finish, and then set up our
             * own read attempt if the page is still not BM_VALID.
             * StartBufferIO does it all.
             */
            if (StartBufferIO(buf, true))
            {
                /*
                 * If we get here, previous attempts to read the buffer must
                 * have failed ...
                 */
                return InvalidBuffer;
            }
        }

        return buf;
    }

    /*
     * Didn't find it in the buffer pool, done.
     */
    LWLockRelease(newPartitionLock);

    return InvalidBuffer;


Now, that's not trivial, but also not too complicated.

> With RBM_NORMAL_ZERO_OK, AFAICS we're talking about a tiny patch to
> XLogReadBufferExtended. bufmgr.c doesn't need to do anything about the new
> mode, as it's XLogReadBuffer that does the the check for zero pages. Per
> attached patch (for demonstration purposes only, you also need to add the
> new mode to the header file and adjust comments).

I thought about that approach at first as well, but I am not so sure
it's sufficient. Isn't it quite possible that we'd end up reading a page
that was *partially* written during a crash and due to that has a
corrupted checksum? There won't be any protection due to WAL replay/full
page writes against that case here.
Now, you could argue that that shouldn't be the case because we're only
entering that codepath once STANDBY_SNAPSHOT_READY and you might be
right...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-bugs by date:

From: Tom Lane
Date: 13 January 2014, 20:56:49
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages

From: Tom Lane
Date: 13 January 2014, 21:26:17
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages

Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages - Mailing list pgsql-bugs

Previous

Next