Re: production server down - Mailing list pgsql-hackers
From | Joe Conway |
---|---|
Subject | Re: production server down |
Date | |
Msg-id | 41BFD08A.5000501@joeconway.com Whole thread Raw |
In response to | Re: production server down (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: production server down
|
List | pgsql-hackers |
Tom Lane wrote: >>... >>pg_control last modified: Tue Dec 14 15:39:26 2004 >>... >>Time of latest checkpoint: Tue Nov 2 17:05:32 2004 > > [ blink... ] That seems like an unreasonable gap between checkpoints, > especially for a production server. Can you see an explanation? Hmmm, this is even more scary. We have two database clusters on this server, one on /replica/pgdata, and one on /production/pgdata (ignore the names -- /replica is actually the "production" instance at the moment). # pg_controldata /replica/pgdata pg_control version number: 72 Catalog version number: 200310211 Database cluster state: shutting down pg_control last modified: Tue Dec 14 15:39:26 2004 Current log file ID: 0 Next log file segment: 1 Latest checkpoint location: 0/9B0B8C Prior checkpoint location: 0/9AA1B4 Latest checkpoint's REDO location: 0/9B0B8C Latest checkpoint's UNDO location: 0/0 Latest checkpoint's StartUpID: 12 Latest checkpoint's NextXID: 536 Latest checkpoint's NextOID: 17142 Time of latest checkpoint: Tue Nov 2 17:05:32 2004 Database block size: 8192 Blocks per segment of large relation: 131072 Maximum length of identifiers: 64 Maximum number of function arguments: 32 Date/time type storage: 64-bit integers Maximum length of locale name: 128 LC_COLLATE: C LC_CTYPE: C # pg_controldata /production/pgdata pg_control version number: 72 Catalog version number: 200310211 Database cluster state: shutting down pg_control last modified: Tue Nov 2 21:57:49 2004 Current log file ID: 0 Next log file segment: 1 Latest checkpoint location: 0/9B0B8C Prior checkpoint location: 0/9AA1B4 Latest checkpoint's REDO location: 0/9B0B8C Latest checkpoint's UNDO location: 0/0 Latest checkpoint's StartUpID: 12 Latest checkpoint's NextXID: 536 Latest checkpoint's NextOID: 17142 Time of latest checkpoint: Tue Nov 2 17:05:32 2004 Database block size: 8192 Blocks per segment of large relation: 131072 Maximum length of identifiers: 64 Maximum number of function arguments: 32 Date/time type storage: 64-bit integers Maximum length of locale name: 128 LC_COLLATE: C LC_CTYPE: C I have no idea how this happened, but those look too similar except for the "last modified" date. The space used is quite what I'd expect: # du -h --max-depth=1 /replica 403G /replica/pgdata # du -h --max-depth=1 /production 201G /production/pgdata The "/production/pgdata" cluster has not been in use since Nov 2. But we've been loading data aggressively into "/replica/pgdata". Any theories on how we screwed up? Joe
pgsql-hackers by date: