Re: "SMgrRelation hashtable corrupted" failure identified - Mailing list pgsql-hackers
From | Marc G. Fournier |
---|---|
Subject | Re: "SMgrRelation hashtable corrupted" failure identified |
Date | |
Msg-id | 20050110123636.W51884@ganymede.hub.org Whole thread Raw |
In response to | "SMgrRelation hashtable corrupted" failure identified (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: "SMgrRelation hashtable corrupted" failure identified
|
List | pgsql-hackers |
On Mon, 10 Jan 2005, Tom Lane wrote: > We've seen a few reports of the above-mentioned error message from > PG 8.0 testers, but up till now no one had come up with a reproducible > test case. I've now found a trivial example: > > session 1: create table a1 (f1 varchar(128)); > session 2: insert into a1 values('abc'); > session 1: alter table a1 alter column f1 type varchar(256); > session 2: insert into a1 values('abcd'); > session 2 fails with ERROR: SMgrRelation hashtable corrupted > continued use of session 2 leads to a crash > > Many if not all scenarios involving a rewriting ALTER TABLE on a > table in active use by other backends will fail like this. > I believe there are probably similar failures involving CLUSTER, > though a quick try didn't show it. This seems clearly to be a > "must fix for 8.0" bug. > > The basic problem is that when ALTER TABLE tries to swap the physical > files associated with the original table and the temp version of the > table, it sends out relcache inval events for all four combinations > of table OID and relfilenode. Because inval.c is a bit cavalier about > the ordering of inval events, the one that session 2 sees first is the > one for <temp table OID, old relfilenode>. It does not find a relcache > entry for the temp table OID, but it does find an smgr table entry for > the relfilenode, which it proceeds to drop. Now there is a dangling > smgr reference in its relcache, so when it next gets hit with a > relcache clear event for the original table OID, boom! > > I fooled around with trying to patch this by enforcing the "right" > processing order of inval events, but that doesn't work (it just moves > the failure into the sending backend, which it turns out would need > a different processing order to avoid crashing). It would be a horribly > fragile solution anyway. > > I now think that the only reasonable fix is to directly attack the > problem of dangling relcache references to smgr table entries. What we > can do is add a concept of an "owning pointer" to an smgr entry, that > is an "SMgrRelation *myowner" field, and have smgrclose do > something like > if (reln->myowner) > *(reln->myowner) = NULL; > For smgr table entries associated with a relcache entry, the relcache > code would set this field as a back link to its rel->rd_smgr pointer. > With this setup, an smgr-level clear would correctly unhook from the > relcache even if the clear did not come directly through the relcache. > This would simplify RelationCacheInvalidateEntry and > LocalExecuteInvalidationMessage, which could then treat relcache clear > and smgr clear as independent operations. > > Comments? Only: Josh, put a hold on those press releases, looks like an RC5 is forthcoming ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
pgsql-hackers by date: