Thread: Severe Badness On My Server: psql: FATAL: the database system is starting up
Severe Badness On My Server: psql: FATAL: the database system is starting up
From
Mitchell Laks
Date:
Dear Gurus: My Server and me have had a very bad weekend, starting Friday afternoon. I am running Debian Sarge, Postgresql 7.4.6 with linux kernel 2.6.8. I am running a Postgresql backed application on a remote server. The system has a system drive, on which the Postgresql database runs and there is a raid 1 drive on which the application stores data. Well, the raid1 failed (or is failing - or is trying its hardest to fail, not clear yet...). This should not have affected the Postgresql database as it is safely on a separate drive. However, when i logged onto the system, I found that I could not turn off postgresql. I logged in as postgres, did pg_ctl stop and it did ....... and then could not stop (presumably because hanging client applications were not loged off the database). So then I killed all the application clients (kill -9 of them), and still I tried to pg_ctl stop and it did not want to stop. So I looked in ps aux and the client applications looked like they were in D status in ps aux. wustl 18232 0.0 0.2 4872 1920 ? D Mar11 0:00 /usr/local/ctn/bi I then tried to reboot system remotely via login as root and shutdown -r now and even shutdown -h now. Interestingly enough (I have never ever seen this - system refused to shutdown!!!!!!!). I was floored! Well what to do? I decided to sleep on it. Well I logged in then on saturday night and system was still hanging in this bizarre state. I now saw qued shutdown requests in the ps aux. And nothing was happening fast. I thought. I read a little. I tried pg_ctl stop -m fast. It did nothing. I prayed. I tried to do pg_dump LTA_IDB >lta_idb.dump to dump the database in question. It didnt do anything. I was desparate. I decided to try desparate measures I then pulled the gun pg_ctl stop -m i. OK so it stopped. Then I said let me try to dump the database and so I did pg_ctl start. It started postgres@A1:~$ pg_ctl status pg_ctl: postmaster is running (PID: 21195) Command line was: /usr/lib/postgresql/bin/postmaster Then I tried to dump the database and i got some message about the fact that Fatal the database was starting. I waited a while and then I tried again. same message. I then tried as user of the database psql LTA_IDB and message Fatal the database is starting. Then I tried psql LTA_IDB and got Fatal database is starting. I waited. Then I did pg_ctl stop (I dont know why i did it. Perversity I think.) It then said to me ................ something about unable to stop. Then I did postgres@A1:~$ pg_dump LTA_IDB>lta_idb.dump 2005-03-13 10:56:33 [21481] LOG: connection received: host=[local] port= 2005-03-13 10:56:33 [21481] FATAL: the database system is shutting down pg_dump: [archiver (db)] connection to database "LTA_IDB" failed: FATAL: the dn Now I did pg_ctl status postgres@A1:~$ pg_ctl status pg_ctl: postmaster is running (PID: 21195) Command line was: /usr/lib/postgresql/bin/postmaster OK I feel like I am in the twilight zone. Next I did as root cd /var/log ls postg* A1:/var/log# ls post* postgres.log postgres.log.2.gz postgres.log.5.gz postgres.log.8.gz postgres.log.1 postgres.log.3.gz postgres.log.6.gz postgres.log.9.gz postgres.log.10.gz postgres.log.4.gz postgres.log.7.gz A1:/var/log# less postgres.log postgres.log: No such file or directory WHAT???????? df -h /dev/sda2 9.2G 2.8G 6.0G 32% / tmpfs 443M 0 443M 0% /dev/shm /dev/sda1 89M 11M 74M 13% /boot /dev/sda3 7.4G 273M 6.7G 4% /home /dev/sda8 11G 33M 9.9G 1% /mirror /dev/sda7 449M 8.1M 417M 2% /tmp /dev/sda6 7.4G 4.7G 2.4G 67% /var /dev/md0 230G 139G 80G 64% /home/big0 I am in the twilight zone. My sanity is suspect. Any ideas on what to do next? Pull the plug???? Mitchell
Re: Severe Badness On My Server: psql: FATAL: the database system is starting up
From
Tom Lane
Date:
Mitchell Laks <mlaks@verizon.net> writes: > Well, the raid1 failed (or is failing - or is trying its hardest to fail, not > clear yet...). This should not have affected the Postgresql database as it is > safely on a separate drive. Try turning off the power, physically disconnecting the raid1, and rebooting. It sounds to me like the raid drive is so wedged that the kernel is getting confused (or at least hanging operations that theoretically shouldn't hang). regards, tom lane