We need to rethink relation cache entry rebuild - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | We need to rethink relation cache entry rebuild |
Date | |
Msg-id | 24663.1262998895@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: We need to rethink relation cache entry rebuild
Build farm tweaks Re: We need to rethink relation cache entry rebuild Re: We need to rethink relation cache entry rebuild Re: We need to rethink relation cache entry rebuild |
List | pgsql-hackers |
I mentioned earlier that buildfarm member jaguar (that's the one that builds with CLOBBER_CACHE_ALWAYS) was showing suspicious intermittent failures. There's another one today: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguar&dt=2010-01-08%2004:00:02 and I also managed to reproduce a similar problem locally after running the parallel regression tests with CLOBBER_CACHE_ALWAYS for about a day. I'm not entirely sure yet that I understand everything that is going wrong, but there is one thing that is real clear: honoring a SIGINT or SIGTERM during relcache rebuild leads to Assert failure or worse. The stack trace at the bottom of the above-mentioned page shows the problem pretty clearly. What appears to have happened is that a DROP DATABASE was issued, causing a cancel to be sent to an autovac worker in that database, and the worker happened to be in the middle of a relcache entry rebuild when it accepted the signal. (CLOBBER_CACHE_ALWAYS makes this hugely more probable, but it could happen in ordinary builds too.) The problem is that the relcache entry being rebuilt is referenced, so it has a positive reference count that is tracked in a ResourceOwner ... but the refcount field has been temporarily zeroed while RelationBuildDesc runs, so when transaction abort tries to release the refcount we hit that Assert in RelationDecrementReferenceCount. In a non-Assert build you wouldn't notice an immediate problem, but that doesn't mean things are okay in the field. The problem can be stated more generally as: relcache rebuild temporarily messes up a relcache entry that other places have got pointers to. If an error occurs during rebuild, those other places might still try to use their pointers, expecting to reference a valid relcache entry. I think this could happen for example if the error occurred inside a subtransaction, and we trapped the exception and allowed the outer transaction to resume. Code in the outer transaction would naturally still expect its pointer to a valid, reference-count-incremented relcache entry to be good. RelationClearRelation does take the step of de-linking the entry it's going to rebuild from the hash table. When that code was designed, before resource owners or subtransactions, I think it was actually safe --- an error could result in a leaked entry in CacheMemoryContext, but there could not be any live pointers to the partially-rebuilt entry after we did transaction abort cleanup. Now, however, it's entirely not safe, and it's only the rather low probability of failure that has prevented us from spotting the problem before. Basically I think we have to fix this by ensuring that an error escape can't occur while a relcache entry is in a partially rebuilt state. What I have in mind is to rewrite RelationClearRelation so that it does a rebuild this way: 1. Build a new relcache entry from scratch. This entry won't be linked from anywhere. A failure here results in no worse than some leaked memory. 2. Swap the contents of the old and new relcache entries. Do this in straight-line code that has no possibility of CHECK_FOR_INTERRUPTS. But don't swap the refcount fields, nor certain infrastructure pointers that we try to preserve intact if possible --- for example, don't swap the tupledesc pointers if the old and new tupledescs are equal. 3. Free the new relcache entry along with the (possibly swapped) infrastructure for it. One slightly tricky spot is that the index infrastructure has to be swapped all or nothing, because it all lives in a single subsidiary memory context. I think this is all right, but if it isn't we'll need to expend more code on cleaning that infrastructure up properly instead of just destroying a context. Comments? regards, tom lane
pgsql-hackers by date: