Version 7.2.3 unrecoverable crash on missing pg_clog - Mailing list pgsql-bugs
From | Andy Osborne |
---|---|
Subject | Version 7.2.3 unrecoverable crash on missing pg_clog |
Date | |
Msg-id | 3E1D7E5C.8090207@sift.co.uk Whole thread Raw |
Responses |
Re: Version 7.2.3 unrecoverable crash on missing pg_clog
|
List | pgsql-bugs |
All, One of our databases crashed yesterday with a bug that looks a lot like the non superuser vacuum issue that 7.2.3 was intended to fix, although we do our vacuum with a user that has usesuper=t in pg_user so I guess it's not that simple. From the logs: DEBUG: pq_recvbuf: unexpected EOF on client connection FATAL 2: open of /u0/pgdata/pg_clog/0726 failed: No such file or directory DEBUG: server process (pid 4232) exited with exit code 2 DEBUG: terminating any other active server processes NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. [backend message repeated] FATAL 1: The database system is in recovery mode [repeated] DEBUG: all server processes terminated; reinitializing shared memory and semaphores DEBUG: database system was interrupted at 2003-01-08 20:14:06 GMT DEBUG: checkpoint record is at 69/74D200E4 DEBUG: redo record is at 69/74D0DA14; undo record is at 0/0; shutdown FALSE DEBUG: next transaction id: 221940405; next oid: 281786728 DEBUG: database system was not properly shut down; automatic recovery in progress DEBUG: redo starts at 69/74D0DA14 DEBUG: ReadRecord: record with zero length at 69/74D2634C DEBUG: redo done at 69/74D26328 FATAL 1: The database system is starting up [repeated] DEBUG: database system is ready Then almost immediately it went out again FATAL 2: open of /u0/pgdata/pg_clog/0656 failed: No such file or directory DEBUG: server process (pid 13054) exited with exit code 2 DEBUG: terminating any other active server processes NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. [repeated] FATAL 1: The database system is in recovery mode FATAL 1: The database system is in recovery mode FATAL 1: The database system is in recovery mode DEBUG: all server processes terminated; reinitializing shared memory and semaphores DEBUG: database system was interrupted at 2003-01-08 20:16:12 GMT DEBUG: checkpoint record is at 69/74D2634C DEBUG: redo record is at 69/74D2634C; undo record is at 0/0; shutdown TRUE DEBUG: next transaction id: 221940709; next oid: 281786728 DEBUG: database system was not properly shut down; automatic recovery in progress FATAL 1: The database system is starting up [repeated] DEBUG: redo starts at 69/74D2638C DEBUG: ReadRecord: record with zero length at 69/754828E8 DEBUG: redo done at 69/754828C4 FATAL 1: The database system is starting up [repeated] DEBUG: database system is ready and again FATAL 2: open of /u0/pgdata/pg_clog/0452 failed: No such file or directory DEBUG: server process (pid 13451) exited with exit code 2 DEBUG: terminating any other active server processes NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. and so on until we shut it down. Platform is a Dell 6650 Quad Xeon 1.4GHz with hyperthreading switched on. 2GB RAM. Running RedHat 7.3 with their kernel "2.4.18-10smp #1 SMP Wed Aug 7 11:17:48 EDT 2002 i686 unknown". We built our postgresql from source with: ./configure --with-perl --with-openssl --enable-syslog and with NAMEDATALEN = 64 in postgres_ext.h. select version() reports PostgreSQL 7.2.3 on i686-pc-linux-gnu, compiled by GCC 2.96 and we are building against perl "perl5 (revision 5.0 version 6 subversion 1)" which is from the RedHat rpm but rebuilt to have a shared libperl. With postgres up in single user mode, anything that touched one particular table (called news - very active and about 450MB in size and about 83k rows) caused postgres to fail as above. In the end we dropped this table, vacuumed (full) the database and put the table back from a backup that was about 3 hrs old. The database has been ok since. We vacuum every night and vacuum --full once a week. The database cluster has six databases (8 incl template[01]) of which five are very active. Typically 150 or so connections active. postgresql.conf options that we've altered from default are ... max_connections = 512 shared_buffers = 8192 wal_buffers = 12 sort_mem = 32768 vacuum_mem = 32768 wal_files = 8 This is the only time we've seen this happen and I can't reproduce it on our test machines. Pretty scary none the less !. Has anyone else had similar problems with 7.2.3 ?. Any clues ? Andy -- Andy Osborne **************** "Vertical B2B Communities" Senior Internet Engineer Sift Group 100 Victoria Street, Bristol BS1 6HZ tel:+44 117 915 9600 fax:+44 117 915 9630 http://www.sift.co.uk
pgsql-bugs by date: