Re: failure with pg_dump - Mailing list pgsql-novice
From | Mija Lee |
---|---|
Subject | Re: failure with pg_dump |
Date | |
Msg-id | 475EBBFD.1020005@scharp.org Whole thread Raw |
In response to | Re: failure with pg_dump (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: failure with pg_dump
|
List | pgsql-novice |
We've had a number of odd things that have been going on that I can't really explain, and that don't seem to result in log entries. Here's some info: - this is running 8.2.4 on a solaris 10 machine - I reran the dump after posting and these problems did not reoccur - We have a number of replicated schemas and tables on this server. There were other problems with the replication that happened earlier in the evening. - we have been having some very odd problems where our replication scripts hang intermittantly. For the life of me I can't figure out why, but when this happens, I look for processes that are idle in transaction that are more than one day old and kill them. That seems to allow the replication to finish. I have a few users that use a variety of products to view and manipulate the data in these tables (tableau, access, excel, ems, phppgadmin, dbvisualizer) and it seems like some connections/transactions never terminate, but I can't figure out which ones or why. I've been struggling with this problem for some time, but have never had an issue with the stalled replication affecting the dump. I was actually hoping that this error would help shed light on the replication problem. Mija Tom Lane wrote: > Mija Lee <mija@scharp.org> writes: >> I have a script that I use to do regular dumps of my database. Over the >> weekend it failed, and produced the following error message. I'm not >> sure why this would have happened, how I would find out which index is >> referenced by 136451098, or where this select came from. > > It sounds like system catalog corruption, which is not good :-(. > >> pg_dump.sqlhost: Error message from server: ERROR: cache lookup failed >> for index 136451098 >> pg_dump.sqlhost: The command was: SELECT t.tableoid, t.oid, t.relname as >> indexname, pg_catalog.pg_get_indexdef(i.indexrelid) as indexdef, >> t.relnatts as indnkeys, i.indkey, i.indisclustered, c.contype, >> c.conname, c.tableoid as contableoid, c.oid as conoid, (SELECT spcname >> FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) as >> tablespace, array_to_string(t.reloptions, ', ') as options FROM >> pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid = >> i.indexrelid) LEFT JOIN pg_catalog.pg_depend d ON (d.classid = >> t.tableoid AND d.objid = t.oid AND d.deptype = 'i') LEFT JOIN >> pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid >> = c.oid) WHERE i.indrelid = '136451090'::pg_catalog.oid ORDER BY indexname > > That looks like pg_dump's query to get information about the indexes of > a particular table. So apparently the problem index is one of the ones > for the table with OID 136451090. The easiest way to find out which one > that is is > select '136451090'::regclass; > Trying \d on each of that table's indexes in succession would tell you > which one is trashed. > > As for fixing it, the $64 question is how extensive is the catalog > corruption. I see no very good reason to hope that only this one index > is affected :-(. What you probably want to do is try to get a clean > pg_dump then initdb and reload --- at least that's how I'd approach it, > rather than hoping that there's no lurking problems remaining after you > hack your way around the one you can see. > > What I'd try first is a REINDEX on pg_class. If that doesn't help, > try to delete the pg_index row linking 136451098 and 136451090. > > What PG version is this, anyway, and did anything weird happen on your > system that might explain data corruption? > > regards, tom lane
pgsql-novice by date: