Home > mailing lists

Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
Date	February 4, 2017 09:40:14
Msg-id	CAA4eK1Lh8bQVizws36sD=r1whxtvuaRc2YhUbqV1aEwwrqiD8Q@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses	Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
List	pgsql-hackers

Tree view

On Sat, Feb 4, 2017 at 12:12 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Pavan Deolasee wrote:
>
>> Looking at the history and some past discussions, it seems Tomas reported
>> somewhat similar problem and Andres proposed a patch here
>> https://www.postgresql.org/message-id/20140514155204.GE23943@awork2.anarazel.de
>> which got committed via b23b0f5588d2e2. Not exactly the same issue, but
>> related to relcache flush happening in index_open().
>>
>> I wonder if we should just add a recheck logic
>> in RelationGetIndexAttrBitmap() such that if rd_indexvalid is seen as 0 at
>> the end, we go back and start again from RelationGetIndexList(). Or is
>> there any code path that relies on finding the old information?
>
> Good find on that old report.  It occured to me to re-run Tomas' test
> case with this second patch you proposed (the "ddl" test takes 11
> minutes in my laptop), and lo and behold -- your "recheck" code enters
> an infinite loop.  Not good.  I tried some more tricks that came to
> mind, but I didn't get any good results.  Pavan and I discussed it
> further and he came up with the idea of returning the bitmapset we
> compute, but if an invalidation occurs in index_open, simply do not
> cache the bitmapsets so that next time this is called they are all
> computed again.  Patch attached.
>
> This appears to have appeased both test_decoding's ddl test under
> CACHE_CLOBBER_ALWAYS, as well as the CREATE INDEX CONCURRENTLY bug.
>
> I intend to commit this soon to all branches, to ensure it gets into the
> next set of minors.
>

Thanks for detailed explanation, using steps mentioned in your mail
and Pavan's previous mail steps, I could reproduce the problem.

Regarding your fix:

+ * Now save copies of the bitmaps in the relcache entry, but only if no
+ * invalidation occured in the meantime.  We intentionally set rd_indexattr
+ * last, because that's the one that signals validity of the values; if we
+ * run out of memory before making that copy, we won't leave the relcache
+ * entry looking like the other ones are valid but empty. */
- oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
- relation->rd_keyattr = bms_copy(uindexattrs);
- relation->rd_pkattr = bms_copy(pkindexattrs);
- relation->rd_idattr = bms_copy(idindexattrs);
- relation->rd_indexattr = bms_copy(indexattrs);
- MemoryContextSwitchTo(oldcxt);
+ if (relation->rd_indexvalid != 0)
+ {
+ MemoryContext oldcxt;
+
+ oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+ relation->rd_keyattr = bms_copy(uindexattrs);
+ relation->rd_pkattr = bms_copy(pkindexattrs);
+ relation->rd_idattr = bms_copy(idindexattrs);
+ relation->rd_indexattr = bms_copy(indexattrs);
+ MemoryContextSwitchTo(oldcxt);
+ }

If we do above, then I think primary key attrs won't be returned
because for those we are using relation copy rather than an original
working copy of attrs. See code below:

switch (attrKind)
{
..
case INDEX_ATTR_BITMAP_PRIMARY_KEY:
return bms_copy(relation->rd_pkattr);


Apart from above, I think after the proposed patch, it will be quite
possible that in heap_update(), hot_attrs, key_attrs and id_attrs are
computed using different index lists (consider relcache flush happens
after computation of hot_attrs or key_attrs) which was previously not
possible.  I think in the later code we assume that all the three
should be computed using the same indexlist.  For ex. see the below
code:

HeapSatisfiesHOTandKeyUpdate()
{
..
Assert(bms_is_subset(id_attrs, key_attrs));
Assert(bms_is_subset(key_attrs, hot_attrs));
..
}

Also below comments in heap_update indicate that we shouldn't worry
about relcache flush between the computation of hot_attrs, key_attrs
and id_attrs.

heap_update()
{
..

* Note that we get a copy here, so we need not worry about relcache flush
* happening midway through.
*/

hot_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_ALL);
key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
id_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY);
..
}


Typo.
/occured/occurred


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: "Shinoda, Noriyoshi"
Date: 04 February 2017, 08:50:07
Subject: [HACKERS] pg_sequences bug ?

From: Konstantin Knizhnik
Date: 04 February 2017, 10:19:33
Subject: Re: [HACKERS] logical decoding of two-phase transactions

Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY - Mailing list pgsql-hackers

Previous

Next