Thread: error: could not find pg_class tuple for index 2662

error: could not find pg_class tuple for index 2662

From

daveg

Date:

27 July 2011, 22:13:49

My client has been seeing regular instances of the following sort of problem:

...03:06:09.453 exec_simple_query, postgres.c:90003:06:12.042 XX000: could not find pg_class tuple for index 2662 at
character1303:06:12.042 RelationReloadIndexInfo, relcache.c:174003:06:12.042 INSERT INTO zzz_k(k) SELECT
...03:06:12.04500000: statement: ABORT03:06:12.045 exec_simple_query, postgres.c:90003:06:12.045 00000: duration: 0.100
ms03:06:12.045exec_simple_query, postgres.c:112803:06:12.046 00000: statement: INSERT INTO temp_807
VALUES(...)03:06:12.046 exec_simple_query, postgres.c:90003:06:12.046 XX000: could not find pg_class tuple for index
2662at character 1303:06:12.046 RelationReloadIndexInfo, relcache.c:174003:06:12.046 INSERT INTO temp_807
     VALUES (...)03:06:12.096 08P01: unexpected EOF on client connection03:06:12.096 SocketBackend,
postgres.c:34803:06:12.096XX000: could not find pg_class tuple for index 266203:06:12.096 RelationReloadIndexInfo,
relcache.c:174003:06:12.12100000: disconnection: session time: 0:06:08.537 user=ZZZ database=ZZZ_0103:06:12.121
log_disconnections,postgres.c:4339
 


The above happens regularly (but not completely predictably) corresponding
with a daily cronjob that checks the catalogs for bloat and does vacuum full
and/or reindex as needed. Since some of the applications make very heavy
use of temp tables this will usually mean pg_class and pg_index get vacuum
full and reindex.

Sometimes queries will fail due to being unable to open a tables containing
file. On investigation the file will be absent in both the catalogs and the
filesystem so I don't know what table it refers to:
20:41:19.063  ERROR:  could not open file "pg_tblspc/16401/PG_9.0_201008051/16413/1049145092": No such file or
directory20:41:19.063 STATEMENT:  insert into r_ar__30                      select aid, mid, pid, sum(wdata) as wdata,
...
--20:41:19.430  ERROR:  could not open file "pg_tblspc/16401/PG_9.0_201008051/16413/1049145092": No such file or
directory20:41:19.430 STATEMENT: SELECT nextval('j_id_seq')
 


Finallly, I have seen a several instances of failure to read data by
vacuum full itself:
03:05:45.699 00000: statement: vacuum full pg_catalog.pg_index;03:05:45.699 exec_simple_query,
postgres.c:90003:05:46.142XX001: could not read block 65 in file "pg_tblspc/16401/PG_9.0_201008051/16416/1049146489":
readonly 0 of 8192 bytes03:05:46.142 mdread, md.c:65603:05:46.142 vacuum full pg_catalog.pg_index;
 

This occurs on postgresql 9.0.4. on 32 core 512GB Dell boxes. We have
identical systems still running 8.4.8 that do not have this issue, so I'm
assuming it is related to the vacuum full work done for 9.0. Oddly, we don't
see this on the smaller hosts (8 core, 64GB, slower cpus) running 9.0.4,
so it may be timing related.

This seems possibly related to the issues in:
 Bizarre buildfarm failure on baiji: can't find pg_class_oid_index
http://archives.postgresql.org/pgsql-hackers/2010-02/msg02038.phpBroken HOT chains in system catalogs
http://archives.postgresql.org/pgsql-hackers/2011-04/msg00777.php

As far as I can tell from the logs I have, once a session sees one of these
errors any subsequent query will hit it again until the session exits.
However, it does not seem to harm other sessions or leave any persistant
damage (crossing fingers and hoping here).

I'm ready to do any testing/investigation/instrumented builds etc that may be
helpful in resolving this.

Regards

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: error: could not find pg_class tuple for index 2662

From

Robert Haas

Date:

28 July 2011, 10:46:48

On Wed, Jul 27, 2011 at 8:28 PM, daveg <daveg@sonic.net> wrote:
> My client has been seeing regular instances of the following sort of problem:

On what version of PostgreSQL?

> If simplicity worked, the world would be overrun with insects.

I thought it was...  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: error: could not find pg_class tuple for index 2662

From

daveg

Date:

28 July 2011, 18:46:53

On Thu, Jul 28, 2011 at 09:46:41AM -0400, Robert Haas wrote:
> On Wed, Jul 27, 2011 at 8:28 PM, daveg <daveg@sonic.net> wrote:
> > My client has been seeing regular instances of the following sort of problem:
> On what version of PostgreSQL?

9.0.4.

I previously said:
> > This occurs on postgresql 9.0.4. on 32 core 512GB Dell boxes. We have
> > identical systems still running 8.4.8 that do not have this issue, so I'm
> > assuming it is related to the vacuum full work done for 9.0. Oddly, we don't
> > see this on the smaller hosts (8 core, 64GB, slower cpus) running 9.0.4,
> > so it may be timing related.

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: error: could not find pg_class tuple for index 2662

From

Robert Haas

Date:

28 July 2011, 20:45:08

On Thu, Jul 28, 2011 at 5:46 PM, daveg <daveg@sonic.net> wrote:
> On Thu, Jul 28, 2011 at 09:46:41AM -0400, Robert Haas wrote:
>> On Wed, Jul 27, 2011 at 8:28 PM, daveg <daveg@sonic.net> wrote:
>> > My client has been seeing regular instances of the following sort of problem:
>> On what version of PostgreSQL?
>
> 9.0.4.
>
> I previously said:
>> > This occurs on postgresql 9.0.4. on 32 core 512GB Dell boxes. We have
>> > identical systems still running 8.4.8 that do not have this issue, so I'm
>> > assuming it is related to the vacuum full work done for 9.0. Oddly, we don't
>> > see this on the smaller hosts (8 core, 64GB, slower cpus) running 9.0.4,
>> > so it may be timing related.

Ah, OK, sorry.  Well, in 9.0, VACUUM FULL is basically CLUSTER, which
means that a REINDEX is happening as part of the same operation.  In
9.0, there's no point in doing VACUUM FULL immediately followed by
REINDEX.  My guess is that this is happening either right around the
time the VACUUM FULL commits or right around the time the REINDEX
commits.  It'd be helpful to know which, if you can figure it out.

If there's not a hardware problem causing those read errors, maybe a
backend is somehow ending up with a stale or invalid relcache entry.
I'm not sure exactly how that could be happening, though...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: error: could not find pg_class tuple for index 2662

From

daveg

Date:

29 July 2011, 03:31:41

On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote:
> On Thu, Jul 28, 2011 at 5:46 PM, daveg <daveg@sonic.net> wrote:
> > On Thu, Jul 28, 2011 at 09:46:41AM -0400, Robert Haas wrote:
> >> On Wed, Jul 27, 2011 at 8:28 PM, daveg <daveg@sonic.net> wrote:
> >> > My client has been seeing regular instances of the following sort of problem:
> >> On what version of PostgreSQL?
> >
> > 9.0.4.
> >
> > I previously said:
> >> > This occurs on postgresql 9.0.4. on 32 core 512GB Dell boxes. We have
> >> > identical systems still running 8.4.8 that do not have this issue, so I'm
> >> > assuming it is related to the vacuum full work done for 9.0. Oddly, we don't
> >> > see this on the smaller hosts (8 core, 64GB, slower cpus) running 9.0.4,
> >> > so it may be timing related.
> 
> Ah, OK, sorry.  Well, in 9.0, VACUUM FULL is basically CLUSTER, which
> means that a REINDEX is happening as part of the same operation.  In
> 9.0, there's no point in doing VACUUM FULL immediately followed by
> REINDEX.  My guess is that this is happening either right around the
> time the VACUUM FULL commits or right around the time the REINDEX
> commits.  It'd be helpful to know which, if you can figure it out.

I'll update my vacuum script to skip reindexes after vacuum full for 9.0
servers and see if that makes the problem go away. Thanks for reminding
me that they are not needed. However, I suspect it is the vacuum, not the
reindex causing the problem. I'll update when I know.

> If there's not a hardware problem causing those read errors, maybe a
> backend is somehow ending up with a stale or invalid relcache entry.
> I'm not sure exactly how that could be happening, though...

It does not appear to be a hardware problem. I also suspect it is a stale
relcache.

-dg
-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: error: could not find pg_class tuple for index 2662

From

Tom Lane

Date:

29 July 2011, 10:55:53

daveg <daveg@sonic.net> writes:
> On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote:
>> Ah, OK, sorry.  Well, in 9.0, VACUUM FULL is basically CLUSTER, which
>> means that a REINDEX is happening as part of the same operation.  In
>> 9.0, there's no point in doing VACUUM FULL immediately followed by
>> REINDEX.  My guess is that this is happening either right around the
>> time the VACUUM FULL commits or right around the time the REINDEX
>> commits.  It'd be helpful to know which, if you can figure it out.

> I'll update my vacuum script to skip reindexes after vacuum full for 9.0
> servers and see if that makes the problem go away.

The thing that was bizarre about the one instance in the buildfarm was
that the error was persistent, ie, once a session had failed all its
subsequent attempts to access pg_class failed too.  I gather from Dave's
description that it's working that way for him too.  I can think of ways
that there might be a transient race condition against a REINDEX, but
it's very unclear why the failure would persist across multiple
attempts.  The best idea I can come up with is that the session has
somehow cached a wrong commit status for the reindexing transaction,
causing it to believe that both old and new copies of the index's
pg_class row are dead ... but how could that happen?  The underlying
state in the catalog is not wrong, because no concurrent sessions are
upset (at least not in the buildfarm case ... Dave, do you see more than
one session doing this at a time?).
        regards, tom lane

Re: error: could not find pg_class tuple for index 2662

From

Robert Haas

Date:

29 July 2011, 12:17:38

On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> daveg <daveg@sonic.net> writes:
>> On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote:
>>> Ah, OK, sorry.  Well, in 9.0, VACUUM FULL is basically CLUSTER, which
>>> means that a REINDEX is happening as part of the same operation.  In
>>> 9.0, there's no point in doing VACUUM FULL immediately followed by
>>> REINDEX.  My guess is that this is happening either right around the
>>> time the VACUUM FULL commits or right around the time the REINDEX
>>> commits.  It'd be helpful to know which, if you can figure it out.
>
>> I'll update my vacuum script to skip reindexes after vacuum full for 9.0
>> servers and see if that makes the problem go away.
>
> The thing that was bizarre about the one instance in the buildfarm was
> that the error was persistent, ie, once a session had failed all its
> subsequent attempts to access pg_class failed too.  I gather from Dave's
> description that it's working that way for him too.  I can think of ways
> that there might be a transient race condition against a REINDEX, but
> it's very unclear why the failure would persist across multiple
> attempts.  The best idea I can come up with is that the session has
> somehow cached a wrong commit status for the reindexing transaction,
> causing it to believe that both old and new copies of the index's
> pg_class row are dead ... but how could that happen?  The underlying
> state in the catalog is not wrong, because no concurrent sessions are
> upset (at least not in the buildfarm case ... Dave, do you see more than
> one session doing this at a time?).

I was thinking more along the lines of a failure while processing a
sinval message emitted by the REINDEX.  The sinval message doesn't get
fully processed and therefore we get confused about what the
relfilenode is for pg_class.  If that happened for any other relation,
we could recover by scanning pg_class.  But if it happens for pg_class
or pg_class_oid_index, we're toast, because we can't scan them without
knowing what relfilenode to open.

Now that can't be quite right, because of course those are mapped
relations... and I'm pretty sure there are some other holes in my line
of reasoning too.  Just thinking out loud...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: error: could not find pg_class tuple for index 2662

From

Tom Lane

Date:

29 July 2011, 12:27:57

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The thing that was bizarre about the one instance in the buildfarm was
>> that the error was persistent, ie, once a session had failed all its
>> subsequent attempts to access pg_class failed too.

> I was thinking more along the lines of a failure while processing a
> sinval message emitted by the REINDEX.  The sinval message doesn't get
> fully processed and therefore we get confused about what the
> relfilenode is for pg_class.  If that happened for any other relation,
> we could recover by scanning pg_class.  But if it happens for pg_class
> or pg_class_oid_index, we're toast, because we can't scan them without
> knowing what relfilenode to open.

Well, no, because the ScanPgRelation call is not failing internally.
It's performing a seqscan of pg_class and not finding a matching tuple.

You could hypothesize about maybe an sinval message got missed leading
us to scan the old (pre-VAC-FULL) copy of pg_class, but that still
wouldn't explain how come we can't find a valid-looking entry for
pg_class_oid_index in it.

Tis a puzzlement.
        regards, tom lane

Re: error: could not find pg_class tuple for index 2662

From

Robert Haas

Date:

29 July 2011, 12:30:37

On Fri, Jul 29, 2011 at 11:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jul 29, 2011 at 9:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The thing that was bizarre about the one instance in the buildfarm was
>>> that the error was persistent, ie, once a session had failed all its
>>> subsequent attempts to access pg_class failed too.
>
>> I was thinking more along the lines of a failure while processing a
>> sinval message emitted by the REINDEX.  The sinval message doesn't get
>> fully processed and therefore we get confused about what the
>> relfilenode is for pg_class.  If that happened for any other relation,
>> we could recover by scanning pg_class.  But if it happens for pg_class
>> or pg_class_oid_index, we're toast, because we can't scan them without
>> knowing what relfilenode to open.
>
> Well, no, because the ScanPgRelation call is not failing internally.
> It's performing a seqscan of pg_class and not finding a matching tuple.

SnapshotNow race?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: error: could not find pg_class tuple for index 2662

From

Tom Lane

Date:

29 July 2011, 12:32:24

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jul 29, 2011 at 11:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, no, because the ScanPgRelation call is not failing internally.
>> It's performing a seqscan of pg_class and not finding a matching tuple.

> SnapshotNow race?

That's what I would have guessed to start with, except that then it
wouldn't be a persistent failure.  You'd get one, but then the next
query would succeed.
        regards, tom lane

Re: error: could not find pg_class tuple for index 2662

From

daveg

Date:

29 July 2011, 19:35:13

On Fri, Jul 29, 2011 at 09:55:46AM -0400, Tom Lane wrote:
> The thing that was bizarre about the one instance in the buildfarm was
> that the error was persistent, ie, once a session had failed all its
> subsequent attempts to access pg_class failed too.  I gather from Dave's
> description that it's working that way for him too.  I can think of ways
> that there might be a transient race condition against a REINDEX, but
> it's very unclear why the failure would persist across multiple
> attempts.  The best idea I can come up with is that the session has
> somehow cached a wrong commit status for the reindexing transaction,
> causing it to believe that both old and new copies of the index's
> pg_class row are dead ... but how could that happen?  The underlying

It is definitely persistant. Once triggered the error occurs for any new
statement until the session exits.

> state in the catalog is not wrong, because no concurrent sessions are
> upset (at least not in the buildfarm case ... Dave, do you see more than
> one session doing this at a time?).

It looks like it happens to multiple sessions so far as one can tell from
the timestamps of the errors:
timestamp    sessionid     error------------ ------------- ----------------------------------------------03:05:37.434
4e26a861.4a6dcould not find pg_class tuple for index 266203:05:37.434 4e26a861.4a6f could not find pg_class tuple for
index2662
 
03:06:12.041 4e26a731.438e could not find pg_class tuple for index 2662
03:06:12.042 4e21b6a3.629b could not find pg_class tuple for index 266203:06:12.042 4e26a723.42ec could not find
pg_classtuple for index 2662 at character 13
 


-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: error: could not find pg_class tuple for index 2662

From

daveg

Date:

31 July 2011, 05:17:10

On Thu, Jul 28, 2011 at 11:31:31PM -0700, daveg wrote:
> On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote:
> > REINDEX.  My guess is that this is happening either right around the
> > time the VACUUM FULL commits or right around the time the REINDEX
> > commits.  It'd be helpful to know which, if you can figure it out.
> 
> I'll update my vacuum script to skip reindexes after vacuum full for 9.0
> servers and see if that makes the problem go away. Thanks for reminding
> me that they are not needed. However, I suspect it is the vacuum, not the
> reindex causing the problem. I'll update when I know.

Here is the update: the problem happens with vacuum full alone, no reindex
is needed to trigger it. I updated the script to avoid reindexing after
vacuum. Over the past two days there are still many ocurrances of this
error coincident with the vacuum.

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: error: could not find pg_class tuple for index 2662

From

Tom Lane

Date:

31 July 2011, 12:44:49

daveg <daveg@sonic.net> writes:
> Here is the update: the problem happens with vacuum full alone, no reindex
> is needed to trigger it. I updated the script to avoid reindexing after
> vacuum. Over the past two days there are still many ocurrances of this
> error coincident with the vacuum.

Well, that jives with the assumption that the one case we saw in
the buildfarm was the same thing, because the regression tests were
certainly only doing a VACUUM FULL and not a REINDEX of pg_class.
But it doesn't get us much closer to understanding what's happening.
In particular, it seems to knock out most ideas associated with race
conditions, because the VAC FULL should hold exclusive lock on pg_class
until it's completely done (including index rebuilds).

I think we need to start adding some instrumentation so we can get a
better handle on what's going on in your database.  If I were to send
you a source-code patch for the server that adds some more logging
printout when this happens, would you be willing/able to run a patched
build on your machine?

(BTW, just to be perfectly clear ... the "could not find pg_class tuple"
errors always mention index 2662, right, never any other number?)
        regards, tom lane

Re: error: could not find pg_class tuple for index 2662

From

daveg

Date:

01 August 2011, 00:06:45

On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote:
> daveg <daveg@sonic.net> writes:
> > Here is the update: the problem happens with vacuum full alone, no reindex
> > is needed to trigger it. I updated the script to avoid reindexing after
> > vacuum. Over the past two days there are still many ocurrances of this
> > error coincident with the vacuum.
>
> Well, that jives with the assumption that the one case we saw in
> the buildfarm was the same thing, because the regression tests were
> certainly only doing a VACUUM FULL and not a REINDEX of pg_class.
> But it doesn't get us much closer to understanding what's happening.
> In particular, it seems to knock out most ideas associated with race
> conditions, because the VAC FULL should hold exclusive lock on pg_class
> until it's completely done (including index rebuilds).
>
> I think we need to start adding some instrumentation so we can get a
> better handle on what's going on in your database.  If I were to send
> you a source-code patch for the server that adds some more logging
> printout when this happens, would you be willing/able to run a patched
> build on your machine?

Yes we can run an instrumented server so long as the instrumentation does
not interfere with normal operation. However, scheduling downtime to switch
binaries is difficult, and generally needs to be happen on a weekend, but
sometimes can be expedited. I'll look into that.

> (BTW, just to be perfectly clear ... the "could not find pg_class tuple"
> errors always mention index 2662, right, never any other number?)

Yes, only index 2662, never any other.

I'm attaching a somewhat redacted log for two different databases on the same
instance around the time of vacuum full of pg_class in each database.
My observations so far are:

 - the error occurs at commit of vacuum full of pg_class
 - in these cases error hits autovacuum after it waited for a lock on pg_class
 - in these two cases there was a new process startup while the vacuum was
   running. Don't know if this is relevant.
 - while these hit autovacuum, the error does hit other processs (just not in
   these sessions).  Unknown if autovacuum is a required component.

-dg

--
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Attachment

Re: error: could not find pg_class tuple for index 2662

From

Tom Lane

Date:

01 August 2011, 14:23:59

daveg <daveg@sonic.net> writes:
> On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote:
>> I think we need to start adding some instrumentation so we can get a
>> better handle on what's going on in your database.  If I were to send
>> you a source-code patch for the server that adds some more logging
>> printout when this happens, would you be willing/able to run a patched
>> build on your machine?

> Yes we can run an instrumented server so long as the instrumentation does
> not interfere with normal operation. However, scheduling downtime to switch
> binaries is difficult, and generally needs to be happen on a weekend, but
> sometimes can be expedited. I'll look into that.

OK, attached is a patch against 9.0 branch that will re-scan pg_class
after a failure of this sort occurs, and log what it sees in the tuple
header fields for each tuple for the target index.  This should give us
some useful information.  It might be worthwhile for you to also log the
results of

select relname,pg_relation_filenode(oid) from pg_class
where relname like 'pg_class%';

in your script that does VACUUM FULL, just before and after each time it
vacuums pg_class.  That will help in interpreting the relfilenodes in
the log output.

> My observations so far are:

>  - the error occurs at commit of vacuum full of pg_class
>  - in these cases error hits autovacuum after it waited for a lock on pg_class
>  - in these two cases there was a new process startup while the vacuum was
>    running. Don't know if this is relevant.

Interesting.  We'll want to know whether that happens every time.

>  - while these hit autovacuum, the error does hit other processs (just not in
>    these sessions).  Unknown if autovacuum is a required component.

Good question.  Please consider setting log_autovacuum_min_duration = 0
so that the log also traces all autovacuum activity.

            regards, tom lane

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 81cea8b60406dffa7b5d278697ab5ad6cef6a3d8..d7957ba4e0055a4f9f7721e06b72cbeb0ddb350e 100644
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
***************
*** 32,37 ****
--- 32,38 ----

  #include "access/genam.h"
  #include "access/reloptions.h"
+ #include "access/relscan.h"
  #include "access/sysattr.h"
  #include "access/transam.h"
  #include "access/xact.h"
***************
*** 64,69 ****
--- 65,71 ----
  #include "optimizer/var.h"
  #include "rewrite/rewriteDefine.h"
  #include "storage/fd.h"
+ #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
  #include "storage/smgr.h"
  #include "utils/array.h"
*************** ScanPgRelation(Oid targetRelId, bool ind
*** 310,315 ****
--- 312,379 ----
  }

  /*
+  *        ScanPgRelationDetailed
+  *
+  *        Try to figure out why we failed to locate row for relation.
+  */
+ static HeapTuple
+ ScanPgRelationDetailed(Oid targetRelId)
+ {
+     HeapTuple    pg_class_tuple;
+     Relation    pg_class_desc;
+     SysScanDesc pg_class_scan;
+     ScanKeyData key[1];
+
+     /*
+      * form a scan key
+      */
+     ScanKeyInit(&key[0],
+                 ObjectIdAttributeNumber,
+                 BTEqualStrategyNumber, F_OIDEQ,
+                 ObjectIdGetDatum(targetRelId));
+
+     /*
+      * Open pg_class and fetch tuples, forcing heap scan and disabling
+      * visibility checks.
+      */
+     pg_class_desc = heap_open(RelationRelationId, AccessShareLock);
+     pg_class_scan = systable_beginscan(pg_class_desc, ClassOidIndexId,
+                                        false,
+                                        SnapshotAny,
+                                        1, key);
+
+     while (HeapTupleIsValid((pg_class_tuple = systable_getnext(pg_class_scan))))
+     {
+         Buffer        buf = pg_class_scan->scan->rs_cbuf;
+         bool        valid;
+
+         /* need buffer lock to call HeapTupleSatisfiesVisibility */
+         LockBuffer(buf, BUFFER_LOCK_SHARE);
+         valid = HeapTupleSatisfiesVisibility(pg_class_tuple,
+                                              SnapshotNow,
+                                              buf);
+         LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+
+         elog(LOG, "searching %u for pg_class tuple for index %u: found ctid (%u,%u), xmin %u, xmax %u, flags 0x%4x
0x%4x,valid %d", 
+              pg_class_desc->rd_node.relNode,
+              targetRelId,
+              ItemPointerGetBlockNumber(&(pg_class_tuple->t_self)),
+              ItemPointerGetOffsetNumber(&(pg_class_tuple->t_self)),
+              HeapTupleHeaderGetXmin(pg_class_tuple->t_data),
+              HeapTupleHeaderGetXmax(pg_class_tuple->t_data),
+              pg_class_tuple->t_data->t_infomask,
+              pg_class_tuple->t_data->t_infomask2,
+              valid);
+     }
+
+     /* all done */
+     systable_endscan(pg_class_scan);
+     heap_close(pg_class_desc, AccessShareLock);
+
+     return NULL;
+ }
+
+ /*
   *        AllocateRelationDesc
   *
   *        This is used to allocate memory for a new relation descriptor
*************** RelationReloadIndexInfo(Relation relatio
*** 1737,1744 ****
      indexOK = (RelationGetRelid(relation) != ClassOidIndexId);
      pg_class_tuple = ScanPgRelation(RelationGetRelid(relation), indexOK);
      if (!HeapTupleIsValid(pg_class_tuple))
!         elog(ERROR, "could not find pg_class tuple for index %u",
!              RelationGetRelid(relation));
      relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);
      memcpy(relation->rd_rel, relp, CLASS_TUPLE_SIZE);
      /* Reload reloptions in case they changed */
--- 1801,1815 ----
      indexOK = (RelationGetRelid(relation) != ClassOidIndexId);
      pg_class_tuple = ScanPgRelation(RelationGetRelid(relation), indexOK);
      if (!HeapTupleIsValid(pg_class_tuple))
!     {
!         pg_class_tuple = ScanPgRelationDetailed(RelationGetRelid(relation));
!         if (!HeapTupleIsValid(pg_class_tuple))
!             elog(ERROR, "could not find pg_class tuple for index %u",
!                  RelationGetRelid(relation));
!         else
!             elog(LOG, "could not find pg_class tuple for index %u, but succeeded on second try",
!                  RelationGetRelid(relation));
!     }
      relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);
      memcpy(relation->rd_rel, relp, CLASS_TUPLE_SIZE);
      /* Reload reloptions in case they changed */

Re: error: could not find pg_class tuple for index 2662

From

daveg

Date:

03 August 2011, 09:34:27

On Mon, Aug 01, 2011 at 01:23:49PM -0400, Tom Lane wrote:
> daveg <daveg@sonic.net> writes:
> > On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote:
> >> I think we need to start adding some instrumentation so we can get a
> >> better handle on what's going on in your database.  If I were to send
> >> you a source-code patch for the server that adds some more logging
> >> printout when this happens, would you be willing/able to run a patched
> >> build on your machine?
> 
> > Yes we can run an instrumented server so long as the instrumentation does
> > not interfere with normal operation. However, scheduling downtime to switch
> > binaries is difficult, and generally needs to be happen on a weekend, but
> > sometimes can be expedited. I'll look into that.
> 
> OK, attached is a patch against 9.0 branch that will re-scan pg_class
> after a failure of this sort occurs, and log what it sees in the tuple
> header fields for each tuple for the target index.  This should give us
> some useful information.  It might be worthwhile for you to also log the
> results of
> 
> select relname,pg_relation_filenode(oid) from pg_class
> where relname like 'pg_class%';
> 
> in your script that does VACUUM FULL, just before and after each time it
> vacuums pg_class.  That will help in interpreting the relfilenodes in
> the log output.

We have installed the patch and have encountered the error as usual.
However there is no additional output from the patch. I'm speculating
that the pg_class scan in ScanPgRelationDetailed() fails to return
tuples somehow.


I have also been trying to trace it further by reading the code, but have not
got any solid hypothesis yet. In the absence of any debugging output I've
been trying to deduce the call tree leading to the original failure. So far
it looks like this:

RelationReloadIndexInfo(Relation)   // Relation is 2662 and !rd_isvalid   pg_class_tuple = ScanPgRelation(2662,
indexOK=false) // returns NULL       pg_class_desc = heap_open(1259, ACC_SHARE)           r = relation_open(1259,
ACC_SHARE)// locks oid, ensures RelationIsValid(r)               r = RelationIdGetRelation(1259)                   r =
RelationIdCacheLookup(1259)  // assume success                   if !rd_isvalid:
RelationClearRelation(r,true)                           RelationInitPhysicalAddr(r) // r is pg_class relcache
 

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: error: could not find pg_class tuple for index 2662

From

Tom Lane

Date:

03 August 2011, 12:18:29

daveg <daveg@sonic.net> writes:
> We have installed the patch and have encountered the error as usual.
> However there is no additional output from the patch. I'm speculating
> that the pg_class scan in ScanPgRelationDetailed() fails to return
> tuples somehow.

Evidently not, if it's not logging anything, but now the question is
why.  One possibility is that for some reason RelationGetNumberOfBlocks
is persistently lying about the file size.  (We've seen kernel bugs
before that resulted in transiently wrong values, so this isn't totally
beyond the realm of possibility.)  Please try the attached patch, which
extends the previous one to add a summary line including the number of
blocks physically scanned by the seqscan.

            regards, tom lane

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 81cea8b60406dffa7b5d278697ab5ad6cef6a3d8..589513a2d35987b2202ce1162b326db2d358b6fa 100644
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
***************
*** 32,37 ****
--- 32,38 ----

  #include "access/genam.h"
  #include "access/reloptions.h"
+ #include "access/relscan.h"
  #include "access/sysattr.h"
  #include "access/transam.h"
  #include "access/xact.h"
***************
*** 64,69 ****
--- 65,71 ----
  #include "optimizer/var.h"
  #include "rewrite/rewriteDefine.h"
  #include "storage/fd.h"
+ #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
  #include "storage/smgr.h"
  #include "utils/array.h"
*************** ScanPgRelation(Oid targetRelId, bool ind
*** 310,315 ****
--- 312,388 ----
  }

  /*
+  *        ScanPgRelationDetailed
+  *
+  *        Try to figure out why we failed to locate row for relation.
+  */
+ static HeapTuple
+ ScanPgRelationDetailed(Oid targetRelId)
+ {
+     HeapTuple    pg_class_tuple;
+     Relation    pg_class_desc;
+     SysScanDesc pg_class_scan;
+     ScanKeyData key[1];
+     int            count = 0;
+
+     /*
+      * form a scan key
+      */
+     ScanKeyInit(&key[0],
+                 ObjectIdAttributeNumber,
+                 BTEqualStrategyNumber, F_OIDEQ,
+                 ObjectIdGetDatum(targetRelId));
+
+     /*
+      * Open pg_class and fetch tuples, forcing heap scan and disabling
+      * visibility checks.
+      */
+     pg_class_desc = heap_open(RelationRelationId, AccessShareLock);
+     pg_class_scan = systable_beginscan(pg_class_desc, ClassOidIndexId,
+                                        false,
+                                        SnapshotAny,
+                                        1, key);
+
+     while (HeapTupleIsValid((pg_class_tuple = systable_getnext(pg_class_scan))))
+     {
+         Buffer        buf = pg_class_scan->scan->rs_cbuf;
+         bool        valid;
+
+         count++;
+
+         /* need buffer lock to call HeapTupleSatisfiesVisibility */
+         LockBuffer(buf, BUFFER_LOCK_SHARE);
+         valid = HeapTupleSatisfiesVisibility(pg_class_tuple,
+                                              SnapshotNow,
+                                              buf);
+         LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+
+         elog(LOG, "searching %u for pg_class tuple for index %u: found ctid (%u,%u), xmin %u, xmax %u, flags 0x%4x
0x%4x,valid %d", 
+              pg_class_desc->rd_node.relNode,
+              targetRelId,
+              ItemPointerGetBlockNumber(&(pg_class_tuple->t_self)),
+              ItemPointerGetOffsetNumber(&(pg_class_tuple->t_self)),
+              HeapTupleHeaderGetXmin(pg_class_tuple->t_data),
+              HeapTupleHeaderGetXmax(pg_class_tuple->t_data),
+              pg_class_tuple->t_data->t_infomask,
+              pg_class_tuple->t_data->t_infomask2,
+              valid);
+     }
+
+     elog(LOG, "ScanPgRelationDetailed: found %d tuples with OID %u in %u blocks of filenode %u",
+          count,
+          targetRelId,
+          pg_class_scan->scan->rs_nblocks,
+          pg_class_desc->rd_node.relNode);
+
+     /* all done */
+     systable_endscan(pg_class_scan);
+     heap_close(pg_class_desc, AccessShareLock);
+
+     return NULL;
+ }
+
+ /*
   *        AllocateRelationDesc
   *
   *        This is used to allocate memory for a new relation descriptor
*************** RelationReloadIndexInfo(Relation relatio
*** 1737,1744 ****
      indexOK = (RelationGetRelid(relation) != ClassOidIndexId);
      pg_class_tuple = ScanPgRelation(RelationGetRelid(relation), indexOK);
      if (!HeapTupleIsValid(pg_class_tuple))
!         elog(ERROR, "could not find pg_class tuple for index %u",
!              RelationGetRelid(relation));
      relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);
      memcpy(relation->rd_rel, relp, CLASS_TUPLE_SIZE);
      /* Reload reloptions in case they changed */
--- 1810,1824 ----
      indexOK = (RelationGetRelid(relation) != ClassOidIndexId);
      pg_class_tuple = ScanPgRelation(RelationGetRelid(relation), indexOK);
      if (!HeapTupleIsValid(pg_class_tuple))
!     {
!         pg_class_tuple = ScanPgRelationDetailed(RelationGetRelid(relation));
!         if (!HeapTupleIsValid(pg_class_tuple))
!             elog(ERROR, "could not find pg_class tuple for index %u",
!                  RelationGetRelid(relation));
!         else
!             elog(LOG, "could not find pg_class tuple for index %u, but succeeded on second try",
!                  RelationGetRelid(relation));
!     }
      relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);
      memcpy(relation->rd_rel, relp, CLASS_TUPLE_SIZE);
      /* Reload reloptions in case they changed */

Re: error: could not find pg_class tuple for index 2662

From

daveg

Date:

04 August 2011, 06:04:55

On Wed, Aug 03, 2011 at 11:18:20AM -0400, Tom Lane wrote:
> Evidently not, if it's not logging anything, but now the question is
> why.  One possibility is that for some reason RelationGetNumberOfBlocks
> is persistently lying about the file size.  (We've seen kernel bugs
> before that resulted in transiently wrong values, so this isn't totally
> beyond the realm of possibility.)  Please try the attached patch, which
> extends the previous one to add a summary line including the number of
> blocks physically scanned by the seqscan.

Ok, I have results from the latest patch and have attached a redacted
server log with the select relfilenode output added inline. This is the
shorter of the logs and shows the sequence pretty clearly. I have additional
logs if wanted.

Summary: the failing process reads 0 rows from 0 blocks from the OLD
relfilenode.

-dg

--
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Attachment

c27.log

Re: error: could not find pg_class tuple for index 2662

From

Tom Lane

Date:

04 August 2011, 13:28:39

daveg <daveg@sonic.net> writes:
> Summary: the failing process reads 0 rows from 0 blocks from the OLD
> relfilenode. 

Hmm.  This seems to mean that we're somehow missing a relation mapping
invalidation message, or perhaps not processing it soon enough during
some complex set of invalidations.  I did some testing with that in mind
but couldn't reproduce the failure.  It'd be awfully nice to get a look
at the call stack when this happens for you ... what OS are you running?
        regards, tom lane

Re: error: could not find pg_class tuple for index 2662

From

daveg

Date:

04 August 2011, 16:41:43

On Thu, Aug 04, 2011 at 12:28:31PM -0400, Tom Lane wrote:
> daveg <daveg@sonic.net> writes:
> > Summary: the failing process reads 0 rows from 0 blocks from the OLD
> > relfilenode. 
> 
> Hmm.  This seems to mean that we're somehow missing a relation mapping
> invalidation message, or perhaps not processing it soon enough during
> some complex set of invalidations.  I did some testing with that in mind
> but couldn't reproduce the failure.  It'd be awfully nice to get a look
> at the call stack when this happens for you ... what OS are you running?

cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
Linux version 2.6.18-194.el5

I can use gdb as well if we can get a core or stop the correct process.
Perhaps a long sleep when it hits this?
Or perhaps we could log invalidate processing for pg_class?

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: error: could not find pg_class tuple for index 2662

From

daveg

Date:

04 August 2011, 16:52:15

On Thu, Aug 04, 2011 at 12:28:31PM -0400, Tom Lane wrote:
> daveg <daveg@sonic.net> writes:
> > Summary: the failing process reads 0 rows from 0 blocks from the OLD
> > relfilenode. 
> 
> Hmm.  This seems to mean that we're somehow missing a relation mapping
> invalidation message, or perhaps not processing it soon enough during
> some complex set of invalidations.  I did some testing with that in mind
> but couldn't reproduce the failure.  It'd be awfully nice to get a look
> at the call stack when this happens for you ... what OS are you running?

To recap, a few observations:

When it happens the victim has recently been waiting on a lock for a
several seconds.

We create a lot of temp tables, hundreds of thousands a day.

There are catalog vacuum fulls and reindexes running on 30 odd other databases
at the same time. The script estimates the amount of bloat on each table and
index and chooses either reindex on specific indexes or vacuum full as needed.

This is a 32 core (64 with hype threading) 512GB host with several hundred
connections

We are seeing "cannot read' and 'cannot open' errors too that would be
consistant with trying to use a vanished file.

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: error: could not find pg_class tuple for index 2662

From

Tom Lane

Date:

04 August 2011, 17:16:17

daveg <daveg@sonic.net> writes:
> We are seeing "cannot read' and 'cannot open' errors too that would be
> consistant with trying to use a vanished file.

Yeah, these all seem consistent with the idea that the failing backend
somehow missed an update for the relation mapping file.  You would get
the "could not find pg_class tuple" syndrome if the process was holding
an open file descriptor for the now-deleted file, and otherwise cannot
open/cannot read type errors.  And unless it later received another
sinval message for the relation mapping file, the errors would persist.

If this theory is correct then all of the file-related errors ought to
match up to recently-vacuumed mapped catalogs or indexes (those are the
ones with relfilenode = 0 in pg_class).  Do you want to expand your
logging of the VACUUM FULL actions and see if you can confirm that idea?

Since the machine is running RHEL, I think we can use glibc's
backtrace() function to get simple stack traces without too much effort.
I'll write and test a patch and send it along in a bit.
        regards, tom lane

Re: error: could not find pg_class tuple for index 2662

From

daveg

Date:

04 August 2011, 18:09:51

On Thu, Aug 04, 2011 at 04:16:08PM -0400, Tom Lane wrote:
> daveg <daveg@sonic.net> writes:
> > We are seeing "cannot read' and 'cannot open' errors too that would be
> > consistant with trying to use a vanished file.
> 
> Yeah, these all seem consistent with the idea that the failing backend
> somehow missed an update for the relation mapping file.  You would get
> the "could not find pg_class tuple" syndrome if the process was holding
> an open file descriptor for the now-deleted file, and otherwise cannot
> open/cannot read type errors.  And unless it later received another
> sinval message for the relation mapping file, the errors would persist.
> 
> If this theory is correct then all of the file-related errors ought to
> match up to recently-vacuumed mapped catalogs or indexes (those are the
> ones with relfilenode = 0 in pg_class).  Do you want to expand your
> logging of the VACUUM FULL actions and see if you can confirm that idea?

At your service, what would you like to see?
> Since the machine is running RHEL, I think we can use glibc's
> backtrace() function to get simple stack traces without too much effort.
> I'll write and test a patch and send it along in a bit.

Great.

Any point to try to capture SI events somehow?

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: error: could not find pg_class tuple for index 2662

From

Tom Lane

Date:

04 August 2011, 18:26:57

daveg <daveg@sonic.net> writes:
> On Thu, Aug 04, 2011 at 04:16:08PM -0400, Tom Lane wrote:
>> If this theory is correct then all of the file-related errors ought to
>> match up to recently-vacuumed mapped catalogs or indexes (those are the
>> ones with relfilenode = 0 in pg_class).  Do you want to expand your
>> logging of the VACUUM FULL actions and see if you can confirm that idea?

> At your service, what would you like to see?

I was thinking log the before-and-after filenode values each time you do
a VACUUM FULL, and then go through the logs to see if all the
file-related complaints refer to recently obsoleted filenodes.
        regards, tom lane

Re: error: could not find pg_class tuple for index 2662

From

Tom Lane

Date:

04 August 2011, 20:38:38

Ahh ... you know what, never mind about stack traces, let's just see if
the attached patch doesn't fix it.

I still haven't reproduced the behavior here, but I think I see what
must be happening: we are getting an sinval reset while attempting to
open pg_class_oid_index.  The latter condition causes its refcount to
be above 1, which will cause RelationClearRelation to directly call
RelationReloadIndexInfo, which enables the following sequence of calls:
RelationCacheInvalidate -> RelationClearRelation -> RelationReloadIndexInfo.
And the problem is that RelationCacheInvalidate intentionally forces
pg_class_oid_index to be updated first.  That was okay when the code was
written, because the relcache entry for pg_class itself never really
needed any updates.  Now it does, so we have to make sure pg_class gets
updated first, *then* pg_class_oid_index (which might result in a
seqscan of pg_class), then everything else (for which we'll try to use
pg_class_oid_index to locate their pg_class tuples).

            regards, tom lane

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 81cea8b..0e4b17c 100644
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
*************** RelationCacheInvalidate(void)
*** 2185,2204 ****
          {
              /*
               * Add this entry to list of stuff to rebuild in second pass.
!              * pg_class_oid_index goes on the front of rebuildFirstList, other
!              * nailed indexes on the back, and everything else into
!              * rebuildList (in no particular order).
               */
!             if (relation->rd_isnailed &&
!                 relation->rd_rel->relkind == RELKIND_INDEX)
              {
                  if (RelationGetRelid(relation) == ClassOidIndexId)
-                     rebuildFirstList = lcons(relation, rebuildFirstList);
-                 else
                      rebuildFirstList = lappend(rebuildFirstList, relation);
              }
              else
!                 rebuildList = lcons(relation, rebuildList);
          }
      }

--- 2185,2207 ----
          {
              /*
               * Add this entry to list of stuff to rebuild in second pass.
!              * pg_class goes on the front of rebuildFirstList,
!              * pg_class_oid_index goes to the back of rebuildFirstList, other
!              * nailed indexes go on the front of rebuildList, and everything
!              * else goes to the back of rebuildList.
               */
!             if (RelationGetRelid(relation) == RelationRelationId)
!                 rebuildFirstList = lcons(relation, rebuildFirstList);
!             else if (relation->rd_isnailed &&
!                      relation->rd_rel->relkind == RELKIND_INDEX)
              {
                  if (RelationGetRelid(relation) == ClassOidIndexId)
                      rebuildFirstList = lappend(rebuildFirstList, relation);
+                 else
+                     rebuildList = lcons(relation, rebuildList);
              }
              else
!                 rebuildList = lappend(rebuildList, relation);
          }
      }

Re: error: could not find pg_class tuple for index 2662

From

Tom Lane

Date:

05 August 2011, 13:10:51

I wrote:
> Ahh ... you know what, never mind about stack traces, let's just see if
> the attached patch doesn't fix it.

On reflection, that patch would only fix the issue for pg_class, and
that's not the only catalog that gets consulted during relcache reloads.
I think we'd better do it as attached, instead.

            regards, tom lane

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 81cea8b..337300f 100644
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
*************** RelationCacheInvalidate(void)
*** 2159,2164 ****
--- 2159,2169 ----
      List       *rebuildList = NIL;
      ListCell   *l;

+     /*
+      * Reload relation mapping data before starting to reconstruct cache.
+      */
+     RelationMapInvalidateAll();
+
      /* Phase 1 */
      hash_seq_init(&status, RelationIdCache);

*************** RelationCacheInvalidate(void)
*** 2184,2189 ****
--- 2189,2204 ----
          else
          {
              /*
+              * If it's a mapped relation, immediately update its rd_node in
+              * case its relfilenode changed.  We must do this during phase 1
+              * in case the relation is consulted during rebuild of other
+              * relcache entries in phase 2.  It's safe since consulting the
+              * map doesn't involve any access to relcache entries.
+              */
+             if (RelationIsMapped(relation))
+                 RelationInitPhysicalAddr(relation);
+
+             /*
               * Add this entry to list of stuff to rebuild in second pass.
               * pg_class_oid_index goes on the front of rebuildFirstList, other
               * nailed indexes on the back, and everything else into
*************** RelationCacheInvalidate(void)
*** 2209,2219 ****
       */
      smgrcloseall();

-     /*
-      * Reload relation mapping data before starting to reconstruct cache.
-      */
-     RelationMapInvalidateAll();
-
      /* Phase 2: rebuild the items found to need rebuild in phase 1 */
      foreach(l, rebuildFirstList)
      {
--- 2224,2229 ----

Re: error: could not find pg_class tuple for index 2662

From

daveg

Date:

05 August 2011, 21:01:59

On Fri, Aug 05, 2011 at 12:10:31PM -0400, Tom Lane wrote:
> I wrote:
> > Ahh ... you know what, never mind about stack traces, let's just see if
> > the attached patch doesn't fix it.
> 
> On reflection, that patch would only fix the issue for pg_class, and
> that's not the only catalog that gets consulted during relcache reloads.
> I think we'd better do it as attached, instead.
> 
>             regards, tom lane

Should this be applied in addition to the earlier patch, or to replace it?

-dg
-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: error: could not find pg_class tuple for index 2662

From

Tom Lane

Date:

05 August 2011, 21:48:35

daveg <daveg@sonic.net> writes:
> Should this be applied in addition to the earlier patch, or to replace it?

Apply it instead of the earlier one.
        regards, tom lane

Re: error: could not find pg_class tuple for index 2662

From

Tom Lane

Date:

11 August 2011, 18:09:15

I wrote:
> I still haven't reproduced the behavior here, but I think I see what
> must be happening: we are getting an sinval reset while attempting to
> open pg_class_oid_index.

After a number of false starts, I've managed to reproduce this behavior
locally.  The above theory turns out to be wrong, or at least
incomplete.  In order to be opening pg_class_oid_index, we must already
have opened and locked pg_class, and we would have absorbed the
relmapping update for pg_class when we did that, so an sinval reset
during a subsequent relation open is too late to provoke the bug.

Rather, the key to the problem is that the sinval reset has to happen
*instead of* seeing the plain relcache inval on pg_class that was
emitted by the "VACUUM FULL pg_class" command.  In this case, we enter
RelationCacheInvalidate with a stale value for pg_class's relmapping,
and since that routine is not sufficiently careful about the order in
which it revalidates the nailed-relation cache entries, we may try to
read something from pg_class before we've updated pg_class's relmapping.
So the reason Dave is seeing the problem a lot must be that he has very
high sinval traffic, leading to lots of resets, and that increases the
probability of a reset replacing just the wrong sinval message.

I can reproduce the problem fairly conveniently with this crude hack:

diff --git a/src/backend/storage/ipc/sinval.c b/src/backend/storage/ipc/sinval.c
index 8499615..5ad2aee 100644
*** a/src/backend/storage/ipc/sinval.c
--- b/src/backend/storage/ipc/sinval.c
*************** ReceiveSharedInvalidMessages(
*** 106,112 ****       /* Try to get some more messages */       getResult = SIGetDataEntries(messages, MAXINVALMSGS);

!       if (getResult < 0)       {           /* got a reset message */           elog(DEBUG4, "cache state reset");
--- 106,112 ----       /* Try to get some more messages */       getResult = SIGetDataEntries(messages, MAXINVALMSGS);

!       if (getResult != 0)       {           /* got a reset message */           elog(DEBUG4, "cache state reset");

which forces every occurrence of an incoming sinval message to be treated
as a reset.  The serial regression tests still work with that, but they
fall over almost immediately if you run something like this in parallel:
while psql -c "vacuum full pg_class" regression; do usleep 100000; done

and the parallel regression tests tend to fall over without any outside
help because of the one occurrence of "vacuum full pg_class" in them.

I'm inclined to think that a robust solution requires both of the ideas
I proposed last week: we should update all the relmapping entries during
pass 1 of RelationCacheInvalidate, *and* make pass 2 do things in a more
robust order.  I think pg_class, pg_class_oid_index, other nailed
relations, and then everything else ought to do it.

Anyway, that's easily fixed now that we have a reproduceable case.
What's bothering me at the moment is that the CLOBBER_CACHE_ALWAYS hack,
which was meant to expose exactly this sort of problem, failed to do so
--- buildfarm member jaguar has been running with that flag for ages and
never showed this problem.  I'm thinking that we should take out the
hack in AcceptInvalidationMessages and instead put in #ifdeffed code
that causes ReceiveSharedInvalidMessages to forcibly always call the
reset function.  Any thoughts about that?
        regards, tom lane

Re: error: could not find pg_class tuple for index 2662

From

Robert Haas

Date:

12 August 2011, 00:00:05

On Thu, Aug 11, 2011 at 5:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I can reproduce the problem fairly conveniently with this crude hack:
>
> diff --git a/src/backend/storage/ipc/sinval.c b/src/backend/storage/ipc/sinval.c
> index 8499615..5ad2aee 100644
> *** a/src/backend/storage/ipc/sinval.c
> --- b/src/backend/storage/ipc/sinval.c
> *************** ReceiveSharedInvalidMessages(
> *** 106,112 ****
>        /* Try to get some more messages */
>        getResult = SIGetDataEntries(messages, MAXINVALMSGS);
>
> !       if (getResult < 0)
>        {
>            /* got a reset message */
>            elog(DEBUG4, "cache state reset");
> --- 106,112 ----
>        /* Try to get some more messages */
>        getResult = SIGetDataEntries(messages, MAXINVALMSGS);
>
> !       if (getResult != 0)
>        {
>            /* got a reset message */
>            elog(DEBUG4, "cache state reset");
>
> which forces every occurrence of an incoming sinval message to be treated
> as a reset.  The serial regression tests still work with that, but they
> fall over almost immediately if you run something like this in parallel:
>
>        while psql -c "vacuum full pg_class" regression; do usleep 100000; done
>
> and the parallel regression tests tend to fall over without any outside
> help because of the one occurrence of "vacuum full pg_class" in them.

Nice detective work.

> I'm inclined to think that a robust solution requires both of the ideas
> I proposed last week: we should update all the relmapping entries during
> pass 1 of RelationCacheInvalidate, *and* make pass 2 do things in a more
> robust order.  I think pg_class, pg_class_oid_index, other nailed
> relations, and then everything else ought to do it.
>
> Anyway, that's easily fixed now that we have a reproduceable case.
> What's bothering me at the moment is that the CLOBBER_CACHE_ALWAYS hack,
> which was meant to expose exactly this sort of problem, failed to do so
> --- buildfarm member jaguar has been running with that flag for ages and
> never showed this problem.  I'm thinking that we should take out the
> hack in AcceptInvalidationMessages and instead put in #ifdeffed code
> that causes ReceiveSharedInvalidMessages to forcibly always call the
> reset function.  Any thoughts about that?

I'm OK with that, but while we're whacking this around, is there any
chance that we could reduce the number of layers of function calls
that are happening in here?

Right now, we call AcceptInvalidationMessages().
...which basically does nothing but call ReceiveSharedInvalidMessages()
...which in turn calls SIGetDataEntries() ... and then, by way of a
function pointer, LocalExecuteInvalidMessage() or perhaps
InvalidateSystemCaches().

It would be nice to move the short-circuit test I recently inserted at
the top of SIGetDataEntries() somewhere higher up in the call stack,
but right now the layers of abstraction are so thick that it's not
exactly clear how to do that.  In my ideal world, I'd like to turn
AcceptInvalidationMessages into something like this:

#define AcceptInvalidationMessages() \   do { if (*flagvar) ReallyDoIt(); } while (0)

...where flagvar is a pointer to myStateP->hasMessages.  This isn't
really a huge hotspot any more, but a cycle saved is a cycle earned,
and we call this frequently enough that it seems awfully wasteful to
push and pop three call stack frames every time we check and find no
messages.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: error: could not find pg_class tuple for index 2662

From

daveg

Date:

15 August 2011, 04:27:47

[adding back hackers so the thread shows the resolution]

On Sun, Aug 14, 2011 at 07:02:55PM -0400, Tom Lane wrote:
> Sounds good.  Based on my own testing so far, I think that patch will
> probably make things measurably better for you, though it won't resolve
> every corner case.

The most recent catalog vacuums did vacuum full pg_class in 34 databases on
that instance with no new failures. So it looks like the patch fixes it. This
is not conclusive, but it looks very good so far. I'll send an update if I
see any new errors during the week.

Thanks for your help on this. It looks like it has sent you on a merry
search through all the catcache related program activities. I'm assuming
this patch or some improvement on it will show up in a point release.
Meanwhile, if this works as is for couple more days we will resume upgrading
the rest of the hosts to 9.0 using this patch.

-dg

-- 
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: error: could not find pg_class tuple for index 2662

From

Tom Lane

Date:

16 August 2011, 16:45:33

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Aug 11, 2011 at 5:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What's bothering me at the moment is that the CLOBBER_CACHE_ALWAYS hack,
>> which was meant to expose exactly this sort of problem, failed to do so
>> --- buildfarm member jaguar has been running with that flag for ages and
>> never showed this problem. �I'm thinking that we should take out the
>> hack in AcceptInvalidationMessages and instead put in #ifdeffed code
>> that causes ReceiveSharedInvalidMessages to forcibly always call the
>> reset function. �Any thoughts about that?

> I'm OK with that, but while we're whacking this around, is there any
> chance that we could reduce the number of layers of function calls
> that are happening in here?

I'm not excited about that.  We have got three layers that each have
significant complexities of their own to deal with.  The only way to
reduce the layer count is to make some piece of code deal with more of
those complexities at once.  I don't want to go there, especially not
in the name of very marginal cycle-shaving.

> It would be nice to move the short-circuit test I recently inserted at
> the top of SIGetDataEntries() somewhere higher up in the call stack,
> but right now the layers of abstraction are so thick that it's not
> exactly clear how to do that.

Glad you didn't try, because it would be wrong.  The fact that there
are no outstanding messages in sinvaladt.c doesn't prove there are none
yet unprocessed inside ReceiveSharedInvalidMessages.

Anyway, to get back to the original point: I'm getting less excited
about redoing the CLOBBER_CACHE_ALWAYS code at a different level.
The only thing that can really promise is that we find places outside
the cache code that are mistakenly holding onto cache entry pointers.
It can't do very much for the sort of problems I've been finding over
the past week, first because you need some other process actively
changing the underlying information (else the use of a stale cache entry
isn't a problem), and second because when you're looking for bugs
involving use of stale cache entries, over-enthusiastic flushing of the
cache can actually mask the bugs.

I'd still like to think of a better test methodology, but I don't think
"force clobbers inside ReceiveSharedInvalidMessages" is it.
        regards, tom lane

Re: error: could not find pg_class tuple for index 2662

From

Robert Haas

Date:

16 August 2011, 17:09:33

On Tue, Aug 16, 2011 at 3:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It would be nice to move the short-circuit test I recently inserted at
>> the top of SIGetDataEntries() somewhere higher up in the call stack,
>> but right now the layers of abstraction are so thick that it's not
>> exactly clear how to do that.
>
> Glad you didn't try, because it would be wrong.  The fact that there
> are no outstanding messages in sinvaladt.c doesn't prove there are none
> yet unprocessed inside ReceiveSharedInvalidMessages.

Oh.  Good point.

> Anyway, to get back to the original point: I'm getting less excited
> about redoing the CLOBBER_CACHE_ALWAYS code at a different level.
> The only thing that can really promise is that we find places outside
> the cache code that are mistakenly holding onto cache entry pointers.
> It can't do very much for the sort of problems I've been finding over
> the past week, first because you need some other process actively
> changing the underlying information (else the use of a stale cache entry
> isn't a problem), and second because when you're looking for bugs
> involving use of stale cache entries, over-enthusiastic flushing of the
> cache can actually mask the bugs.
>
> I'd still like to think of a better test methodology, but I don't think
> "force clobbers inside ReceiveSharedInvalidMessages" is it.

Makes sense.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company