Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY |
Date | |
Msg-id | CAA4eK1Lh8bQVizws36sD=r1whxtvuaRc2YhUbqV1aEwwrqiD8Q@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
|
List | pgsql-hackers |
On Sat, Feb 4, 2017 at 12:12 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Pavan Deolasee wrote: > >> Looking at the history and some past discussions, it seems Tomas reported >> somewhat similar problem and Andres proposed a patch here >> https://www.postgresql.org/message-id/20140514155204.GE23943@awork2.anarazel.de >> which got committed via b23b0f5588d2e2. Not exactly the same issue, but >> related to relcache flush happening in index_open(). >> >> I wonder if we should just add a recheck logic >> in RelationGetIndexAttrBitmap() such that if rd_indexvalid is seen as 0 at >> the end, we go back and start again from RelationGetIndexList(). Or is >> there any code path that relies on finding the old information? > > Good find on that old report. It occured to me to re-run Tomas' test > case with this second patch you proposed (the "ddl" test takes 11 > minutes in my laptop), and lo and behold -- your "recheck" code enters > an infinite loop. Not good. I tried some more tricks that came to > mind, but I didn't get any good results. Pavan and I discussed it > further and he came up with the idea of returning the bitmapset we > compute, but if an invalidation occurs in index_open, simply do not > cache the bitmapsets so that next time this is called they are all > computed again. Patch attached. > > This appears to have appeased both test_decoding's ddl test under > CACHE_CLOBBER_ALWAYS, as well as the CREATE INDEX CONCURRENTLY bug. > > I intend to commit this soon to all branches, to ensure it gets into the > next set of minors. > Thanks for detailed explanation, using steps mentioned in your mail and Pavan's previous mail steps, I could reproduce the problem. Regarding your fix: + * Now save copies of the bitmaps in the relcache entry, but only if no + * invalidation occured in the meantime. We intentionally set rd_indexattr + * last, because that's the one that signals validity of the values; if we + * run out of memory before making that copy, we won't leave the relcache + * entry looking like the other ones are valid but empty. */ - oldcxt = MemoryContextSwitchTo(CacheMemoryContext); - relation->rd_keyattr = bms_copy(uindexattrs); - relation->rd_pkattr = bms_copy(pkindexattrs); - relation->rd_idattr = bms_copy(idindexattrs); - relation->rd_indexattr = bms_copy(indexattrs); - MemoryContextSwitchTo(oldcxt); + if (relation->rd_indexvalid != 0) + { + MemoryContext oldcxt; + + oldcxt = MemoryContextSwitchTo(CacheMemoryContext); + relation->rd_keyattr = bms_copy(uindexattrs); + relation->rd_pkattr = bms_copy(pkindexattrs); + relation->rd_idattr = bms_copy(idindexattrs); + relation->rd_indexattr = bms_copy(indexattrs); + MemoryContextSwitchTo(oldcxt); + } If we do above, then I think primary key attrs won't be returned because for those we are using relation copy rather than an original working copy of attrs. See code below: switch (attrKind) { .. case INDEX_ATTR_BITMAP_PRIMARY_KEY: return bms_copy(relation->rd_pkattr); Apart from above, I think after the proposed patch, it will be quite possible that in heap_update(), hot_attrs, key_attrs and id_attrs are computed using different index lists (consider relcache flush happens after computation of hot_attrs or key_attrs) which was previously not possible. I think in the later code we assume that all the three should be computed using the same indexlist. For ex. see the below code: HeapSatisfiesHOTandKeyUpdate() { .. Assert(bms_is_subset(id_attrs, key_attrs)); Assert(bms_is_subset(key_attrs, hot_attrs)); .. } Also below comments in heap_update indicate that we shouldn't worry about relcache flush between the computation of hot_attrs, key_attrs and id_attrs. heap_update() { .. * Note that we get a copy here, so we need not worry about relcache flush * happening midway through. */ hot_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_ALL); key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY); id_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY); .. } Typo. /occured/occurred -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: