Re: kill -KILL: What happens? - Mailing list pgsql-hackers
From | David Fetter |
---|---|
Subject | Re: kill -KILL: What happens? |
Date | |
Msg-id | 20110113171235.GA28078@fetter.org Whole thread Raw |
In response to | Re: kill -KILL: What happens? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: kill -KILL: What happens?
|
List | pgsql-hackers |
On Thu, Jan 13, 2011 at 10:41:28AM -0500, Tom Lane wrote: > David Fetter <david@fetter.org> writes: > > I've noticed over the years that we give people dire warnings never to > > send a KILL signal to the postmaster, but I'm unsure as to what are > > potential consequences of this, as in just exactly how this can result > > in problems. Is there some reference I can look to for explanations > > of the mechanism(s) whereby the damage occurs? > > There's no risk of data corruption, if that's what you're thinking of. > It's just that you're then looking at having to manually clean up the > child processes and then restart the postmaster; a process that is not > only tedious but does offer the possibility of screwing yourself. Does this mean that there's no cross-platform way to ensure that killing a process results in its children's timely (i.e. before damage can occur) death? That such a way isn't practical from a performance point of view? > In particular the risk is that someone clueless enough to do this would > next decide that removing $PGDATA/postmaster.pid, rather than killing > all the existing children, is the quickest way to get the postmaster > restarted. Once he's done that, his data will shortly be hosed beyond > recovery, because now he has two noncommunicating sets of backends > massaging the same files via separate sets of shared buffers. Right. > The reason this sequence of events doesn't seem improbable is that the > error you get when you try to start a new postmaster, if there are still > old backends running, is > > FATAL: pre-existing shared memory block (key 5490001, ID 15609) is still in use > HINT: If you're sure there are no old server processes still running, remove the shared memory block or just delete thefile "postmaster.pid". > > Maybe we should rewrite that HINT --- while it's *possible* that > removing the shmem block or deleting postmaster.pid is the right thing > to do, it's not exactly *likely*. I think we need to put a bit more > emphasis on the "If ..." part. Like "If you are prepared to swear on > your mother's grave that there are no old server processes still > running, consider removing postmaster.pid. But first check for existing > processes again." Maybe the hint could give an OS-tailored way to check this... > (BTW, I notice that this interlock against starting a new postmaster > appears to be broken in HEAD, which is likely not unrelated to the > fact that the contents of postmaster.pid seem to be totally bollixed > :-() D'oh! Well, I hope knowing it's a problem gives some kind of glimmer as to how to solve it :) Is this worth writing tests for? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
pgsql-hackers by date: