Re: postmaster dead but backends still running? - Mailing list pgsql-admin
From | Charles Hornberger |
---|---|
Subject | Re: postmaster dead but backends still running? |
Date | |
Msg-id | Pine.LNX.4.53.0306191011140.3921@economex.caltech.edu Whole thread Raw |
In response to | Re: postmaster dead but backends still running? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: postmaster dead but backends still running?
|
List | pgsql-admin |
On Tue, 17 Jun 2003, Tom Lane wrote: > Charles Hornberger <charlie@hss.caltech.edu> writes: > > Other things I perhaps ought to mention: Trying to stop the postmaster > > using pg_ctl fails (unsurprisingly, since pg_ctl relies on > > /var/pgsql/data/postmaster.pid, which contains a nonexistent PID); I > > haven't tried to start a new postmaster yet, because the old backends > > are hanging around. > > In theory a new postmaster would detect the old backends and refuse to > start anyway. I don't trust that interlock unreservedly though. (But > please test it while you have the opportunity...) Unfortunately, our system administrator solved this before I got a chance to test more. I don't know how he went about restarting the server, although whatever he did doesn't appear to have hurt anything; would it be interesting to know exactly what steps he took? > > Nor have I attempted to restart the web server, which might allow the > > hanging-round backends to die by closing the old connections it's > > holding to them. I'm tempted to go ahead and do this, though I'm not > > sure whether I ought to until I've diagnosed what's going on right now. > > You will need to close all the existing connections before the new > postmaster can be started. I'd recommend doing so sooner instead of > later, because with no postmaster you aren't getting any checkpoints > done, and your WAL space is going to start ballooning. > > As far as diagnosing the problem goes: if you have a postmaster log > file, look to see if the postmaster wrote an ERROR or FATAL message > before it exited. (Finding it among all the backend-level messages > might be painful though.) Also look in the directory the postmaster > was started in to see if there's a core file. Save away any evidence > you can find before trying to start a new postmaster. Interestingly, there are no messages in the log file, and I can't find a core file -- in short, there's no evidence whatsoever, at least not that I can find. (Though I am probably a pretty rotten detective.) However, I think I know the cause (though I haven't tested to see if this indeed causes the postmaster to die): A few hours before I noticed that the postmaster was dead, one of the sysadmins made a typo that caused an NFS mount to become unavailable -- the very NFS mount that held the postgres executable (all our Solaris boxes share the same executables). So the theory is that the postmaster tried to fork() a process using a non-existent executable, and died as a result. Does this make any sense? -Charlie > Because the postmaster doesn't actually do much, crashes are pretty > unusual. I'm interested in whatever you can find. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
pgsql-admin by date: