Recursive use of syscaches (was: relation ### modified while in use) - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Recursive use of syscaches (was: relation ### modified while in use) |
Date | |
Msg-id | 15452.973795877@sss.pgh.pa.us Whole thread Raw |
In response to | RE: relation ### modified while in use ("Hiroshi Inoue" <Inoue@tpf.co.jp>) |
Responses |
Re: Recursive use of syscaches (was: relation ### modified
while in use)
Re: Recursive use of syscaches (was: relation ### modified while in use) |
List | pgsql-hackers |
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> Does this occur after a prior error message? I have been suspicious >> because there isn't a mechanism to clear the syscache-busy flags during >> xact abort. > I don't know if I've seen the cases you pointed out. > I have the following gdb back trace. Obviously it calls > SearchSysCache() for cacheId 10 twice. I was able > to get another gdb back trace but discarded it by > mistake. Though I've added pause() just after detecting > recursive use of cache,backends continue the execution > in most cases unfortunately. > I've not examined the backtrace yet. But don't we have > to nail system relation descriptors more than now ? I don't think that's the solution; nailing more descriptors than we absolutely must is not a pretty approach, and I don't think it solves this problem anyway. Your example demonstrates that recursive use of a syscache is perfectly possible when a cache inval message arrives just as we are about to search for a syscache entry. Consider the following path: 1. We are doing index_open and ensuing relcache entry load for some user index. In the middle of this, we need to fetch a not-currently-cached pg_amop entry that is referenced by the index. 2. As we open pg_amop, we receive an SI message for some other user index that is referenced in the current query and so currently has positive refcnt. We therefore attempt to rebuild that index's relcache entry. 3. At this point we have recursive invocation of relcache load, which may well lead to a recursive attempt to fetch the very same pg_amop entry that the outer relcache load is trying to fetch. Therefore, the current error test of checking for re-entrant lookups in the same syscache is bogus. It would still be bogus even if we refined it to notice whether the exact same entry is being sought. On top of that, we have the issue I was concerned about that there is no mechanism for clearing the cache-busy flags during xact abort. Rather than trying to fix this stuff, I propose that we simply remove the test for recursive use of a syscache. AFAICS it will never catch any real bugs in production. It might catch bugs in development (ie, someone messes up the startup sequence in a way that causes a truly circular cache lookup) but I think a stack overflow crash is a perfectly OK result then. regards, tom lane
pgsql-hackers by date: