VM corruption on standby - Mailing list pgsql-hackers

From Andrey Borodin
Subject VM corruption on standby
Date
Msg-id B3C69B86-7F82-4111-B97F-0005497BB745@yandex-team.ru
Whole thread Raw
Responses Re: VM corruption on standby
List pgsql-hackers
Hi hackers!

I was reviewing the patch about removing xl_heap_visible and found the VM\WAL machinery very interesting.
At Yandex we had several incidents with corrupted VM and on pgconf.dev colleagues from AWS confirmed that they saw
somethingsimilar too. 
So I toyed around and accidentally wrote a test that reproduces $subj.

I think the corruption happens as follows:
0. we create a table with one frozen tuple
1. next heap_insert() clears VM bit and hangs immediately, nothing was logged yet
2. VM buffer is flushed on disk with checkpointer or bgwriter
3. primary is killed with -9
now we have a page that is ALL_VISIBLE\ALL_FORZEN on standby, but clear VM bits on primary
4. subsequent insert does not set XLH_LOCK_ALL_FROZEN_CLEARED in it's WAL record
5. pg_visibility detects corruption

Interestingly, in an off-list conversation Melanie explained me how ALL_VISIBLE is protected from this: WAL-logging
dependson PD_ALL_VISIBLE heap page bit, not a state of the VM. But for ALL_FROZEN this is not a case: 

    /* Clear only the all-frozen bit on visibility map if needed */
    if (PageIsAllVisible(page) &&
        visibilitymap_clear(relation, block, vmbuffer,
            VISIBILITYMAP_ALL_FROZEN))
        cleared_all_frozen = true; // this won't happen due to flushed VM buffer before a crash

Anyway, the test reproduces corruption of both bits. And also reproduces selecting deleted data on standby.

The test is not intended to be committed when we fix the problem, so some waits are simulated with sleep(1) and test is
placedat modules/test_slru where it was easier to write. But if we ever want something like this - I can design a less
hackyversion. And, probably, more generic. 

Thanks!


Best regards, Andrey Borodin.




Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: index prefetching
Next
From: Thomas Munro
Date:
Subject: Re: [PATCH] OAuth: fix performance bug with stuck multiplexer events