URGENT: Database keeps crashing - suspect damaged RAM - Mailing list pgsql-general
From | Markus Wollny |
---|---|
Subject | URGENT: Database keeps crashing - suspect damaged RAM |
Date | |
Msg-id | 2266D0630E43BB4290742247C8910575014CE340@dozer.computec.de Whole thread Raw |
Responses |
Re: URGENT: Database keeps crashing - suspect damaged RAM
Re: URGENT: Database keeps crashing - suspect damaged RAM Re: URGENT: Database keeps crashing - suspect damaged RAM |
List | pgsql-general |
Hello! I just installed PostgreSQL 7.2.1 on SuSE 7.3, 4xPIIIXEON 550MHz, 2GB RAM, 5x18GB SCSI RAID. The OS was freshly installed, after that I compiled and installed PostgreSQL from source (./configure --prefix=/opt/pgsql/ --with-perl --enable-odbc --enable-locale --enable-syslog). I copied the settings in postgresql.conf etc. from an identical machine running the identical platform. Then I imported a database to the new installation. The import seems to be successfull, I didn't get any errors during import. A subsequent vacuum analyze did finish without anything out of the ordinary. Just a few minutes after this vacuum analyze, the database crashed for the first time. It keeps crashing every now and then - every one or two minutes. What puzzles me is the fact that this very same machine was running Oracle 8i on Win2k more or less flawlessly just up to a few hours before - more or less meaning that we never really noticed anything much out of the ordinary. There might have been some minor issues after a RAM-upgrade from 1 GB to 2 GB just a week ago, but looking back it's hard to say if that could be due to bad RAM or just some bad code which we've sorted out (or disposed of) by now. As the machine is already running Linux and PostgreSQL it's quite impossible to prove my suspicion by going back to Oracle and having a closer look. What I'd like to know is if I need to look any further than RAM - shall I just chuck the new modules out of the machine? Or is there some other issue that could cause this behaviour? I am quite sure that I didn't do anything wrong during installation, configuration and import and the same application code is running without errors on a different machine at this very moment. I don't like the "record with zero length" and "Cannot allocate memory"-bits in the logfile at all, let alone the "was terminated by signal 9"-thingy. So: Is it bad RAM? How can I make sure? What else could it be? Here's a small excerpt from the logfile: 2002-08-06 17:31:38 [17063] DEBUG: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0. Total CPU 0.00s/0.00u sec elapsed 0.00 sec. 2002-08-06 17:36:23 [17296] DEBUG: _mdfd_blind_getseg: couldn't open /var/lib/pgsql/data/base/base/16596/16671: Cannot allocate memory 2002-08-06 17:36:24 [17296] FATAL 2: cannot write block 13387 of 16596/16671 blind: Cannot allocate memory 2002-08-06 17:36:24 [16530] DEBUG: server process (pid 17296) exited with exit code 2 2002-08-06 17:36:24 [16530] DEBUG: terminating any other active server processes 2002-08-06 17:36:24 [17081] NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. [...] 2002-08-06 17:36:24 [16530] DEBUG: all server processes terminated; reinitializing shared memory and semaphores 2002-08-06 17:36:24 [17298] DEBUG: database system was interrupted at 2002-08-06 17:31:21 CEST 2002-08-06 17:36:24 [17298] DEBUG: checkpoint record is at 0/325D7C78 2002-08-06 17:36:24 [17298] DEBUG: redo record is at 0/325D7C78; undo record is at 0/0; shutdown FALSE 2002-08-06 17:36:24 [17298] DEBUG: next transaction id: 2270; next oid: 901292 2002-08-06 17:36:24 [17298] DEBUG: database system was not properly shut down; automatic recovery in progress 2002-08-06 17:36:24 [17298] DEBUG: redo starts at 0/325D7CB8 2002-08-06 17:36:25 [17298] DEBUG: ReadRecord: record with zero length at 0/326E16C4 2002-08-06 17:36:25 [17298] DEBUG: redo done at 0/326E16A0 2002-08-06 17:36:30 [17298] DEBUG: database system is ready 2002-08-06 17:40:53 [16530] DEBUG: connection startup failed (fork failure): Cannot allocate memory 2002-08-06 17:52:50 [16530] DEBUG: connection startup failed (fork failure): Cannot allocate memory 2002-08-06 17:52:54 [16530] DEBUG: server process (pid 18237) was terminated by signal 9 2002-08-06 17:52:54 [16530] DEBUG: terminating any other active server processes 2002-08-06 17:52:54 [18234] NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. [...] 2002-08-06 17:52:57 [18253] FATAL 1: The database system is in recovery mode 2002-08-06 17:52:57 [18255] FATAL 1: The database system is in recovery mode 2002-08-06 17:52:57 [18254] FATAL 1: The database system is in recovery mode 2002-08-06 17:52:57 [18235] NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. 2002-08-06 17:52:57 [18256] FATAL 1: The database system is in recovery mode 2002-08-06 17:52:57 [18257] FATAL 1: The database system is in recovery mode 2002-08-06 17:52:57 [18258] FATAL 1: The database system is in recovery mode 2002-08-06 17:52:57 [16530] DEBUG: all server processes terminated; reinitializing shared memory and semaphores 2002-08-06 17:52:57 [18260] FATAL 1: The database system is starting up 2002-08-06 17:52:57 [18259] DEBUG: database system was interrupted at 2002-08-06 17:51:38 CEST 2002-08-06 17:52:57 [18259] DEBUG: checkpoint record is at 0/32991848 2002-08-06 17:52:57 [18259] DEBUG: redo record is at 0/3297F4D8; undo record is at 0/0; shutdown FALSE 2002-08-06 17:52:57 [18259] DEBUG: next transaction id: 3704; next oid: 909484 2002-08-06 17:52:57 [18259] DEBUG: database system was not properly shut down; automatic recovery in progress 2002-08-06 17:52:57 [18259] DEBUG: redo starts at 0/3297F4D8 2002-08-06 17:52:57 [18261] FATAL 1: The database system is starting up 2002-08-06 17:52:58 [18259] DEBUG: ReadRecord: record with zero length at 0/32BF0278 2002-08-06 17:52:58 [18259] DEBUG: redo done at 0/32BF0254 2002-08-06 17:52:59 [18262] FATAL 1: The database system is starting up 2002-08-06 17:53:00 [18259] DEBUG: database system is ready 2002-08-06 17:54:24 [16530] DEBUG: connection startup failed (fork failure): Cannot allocate memory 2002-08-06 17:54:31 [16530] DEBUG: server process (pid 18283) was terminated by signal 9 2002-08-06 17:54:31 [16530] DEBUG: terminating any other active server processes 2002-08-06 17:54:31 [18275] NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. [...] 2002-08-06 17:54:32 [16530] DEBUG: all server processes terminated; reinitializing shared memory and semaphores 2002-08-06 17:54:32 [18296] DEBUG: database system was interrupted at 2002-08-06 17:53:00 CEST 2002-08-06 17:54:32 [18296] DEBUG: checkpoint record is at 0/32BF0278 2002-08-06 17:54:32 [18296] DEBUG: redo record is at 0/32BF0278; undo record is at 0/0; shutdown TRUE 2002-08-06 17:54:32 [18296] DEBUG: next transaction id: 4456; next oid: 909484 2002-08-06 17:54:32 [18296] DEBUG: database system was not properly shut down; automatic recovery in progress 2002-08-06 17:54:32 [18296] DEBUG: redo starts at 0/32BF02B8 2002-08-06 17:54:32 [18296] DEBUG: ReadRecord: record with zero length at 0/32F0B3C0 2002-08-06 17:54:32 [18296] DEBUG: redo done at 0/32F0B39C 2002-08-06 17:54:34 [18297] FATAL 1: The database system is starting up 2002-08-06 17:54:34 [18298] FATAL 1: The database system is starting up 2002-08-06 17:54:34 [18299] FATAL 1: The database system is starting up 2002-08-06 17:54:34 [18300] FATAL 1: The database system is starting up 2002-08-06 17:54:34 [18296] DEBUG: database system is ready 2002-08-06 17:57:35 [16530] DEBUG: connection startup failed (fork failure): Cannot allocate memory 2002-08-06 17:57:54 [16530] DEBUG: server process (pid 18366) was terminated by signal 9 2002-08-06 17:57:54 [16530] DEBUG: terminating any other active server processes 2002-08-06 17:57:54 [18368] NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. 2002-08-06 17:57:56 [18409] DEBUG: ReadRecord: record with zero length at 0/3338749C 2002-08-06 17:57:58 [18425] FATAL 1: The database system is starting up 2002-08-06 17:57:58 [18409] DEBUG: database system is ready 2002-08-06 17:58:53 [18432] NOTICE: RelationBuildDesc: can't open idx_bm_user_id: Cannot allocate memory 2002-08-06 17:59:00 [18443] FATAL 1: cannot open pg_attribute: Cannot allocate memory 2002-08-06 17:59:01 [16530] DEBUG: connection startup failed (fork failure): Cannot allocate memory 2002-08-06 17:59:01 [16530] DEBUG: server process (pid 18436) was terminated by signal 9 2002-08-06 17:59:01 [16530] DEBUG: terminating any other active server processes 2002-08-06 17:59:03 [18510] DEBUG: ReadRecord: record with zero length at 0/336E9970 2002-08-06 18:00:15 [16530] DEBUG: connection startup failed (fork failure): Cannot allocate memory 2002-08-06 18:00:17 [18589] DEBUG: ReadRecord: record with zero length at 0/33A7C194 Thank you for your kind assistance! Regards, Markus Wollny
pgsql-general by date: