Home > mailing lists

Re: (Never?) Kill Postmaster? - Mailing list pgsql-general

From	Christian Schröder
Subject	Re: (Never?) Kill Postmaster?
Date	October 31, 2007 20:06:33
Msg-id	47290119.50608@deriva.de Whole thread Raw
In response to	Re: (Never?) Kill Postmaster? (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: (Never?) Kill Postmaster?
List	pgsql-general

Tree view

Tom Lane wrote:
> "Michael Harris" <michael.harris@ericsson.com> writes:
>
>> The tip is ''kill -9' the postmaster', which has two important
>> differences to the scenario I just described:
>> 1) kill -9 means the OS kills the process without allowing it to clean
>> up after itself
>> 2) The postmaster is the master postgresql backend process. If you want
>> to kill a single query you would not want to kill that.
>>
>
> Right: the tip is to not kill -9 the parent process; it's not saying
> anything about child server processes.
>
> If you've got a child process that's unresponsive to SIGINT then you
> can send it a SIGKILL instead; the downside is that this will force a
> restart of the other children too, that is you're interrupting all
> database sessions not only the one.  But Postgres will recover
> automatically and I don't think I've ever heard of anyone getting data
> corruption as a result of such a thing.
>

I have been in exactly this situation today: One statement took several
hours to complete, so it should be cancelled. I tried a
"pg_cancel_backend" and a "kill -2" (which means "SIGINT" on our linux
box), but nothing happened. Since I remembered this thread, I tried a
"kill -9" on this child process. As you described, all other connections
were reset, too, and this was the message in the server log:

<2007-10-31 22:48:28 CET - chschroe> WARNING:  terminating connection
because of crash of another server process
<2007-10-31 22:48:28 CET - chschroe> DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.

But then, when I tried to reconnect to the database, I received the
following message:

<2007-10-31 22:50:01 CET - chschroe> FATAL:  the database system is in
recovery mode

Ok, you wrote "Postgres will recover automatically", but could this take
several minutes? Is that what "recovery mode" means? When nothing seemed
to happen for several minutes, I performed a (fortunately clean) restart
of the whole server. The log messages for the server restart looked
normal to me:

<2007-10-31 22:53:15 CET - > LOG:  received smart shutdown request
<2007-10-31 22:53:21 CET - > LOG:  all server processes terminated;
reinitializing
<2007-10-31 22:53:58 CET - > LOG:  database system was interrupted at
2007-10-31 22:46:46 CET
<2007-10-31 22:53:58 CET - > LOG:  checkpoint record is at 153/FE9FAF20
<2007-10-31 22:53:58 CET - > LOG:  redo record is at 153/FE9FAF20; undo
record is at 0/0; shutdown FALSE
<2007-10-31 22:53:58 CET - > LOG:  next transaction ID: 0/128715865;
next OID: 58311787
<2007-10-31 22:53:58 CET - > LOG:  next MultiXactId: 4704; next
MultiXactOffset: 9414
<2007-10-31 22:53:58 CET - > LOG:  database system was not properly shut
down; automatic recovery in progress
<2007-10-31 22:53:58 CET - > LOG:  redo starts at 153/FE9FAF70
<2007-10-31 22:53:58 CET - > LOG:  record with zero length at 153/FEA05E70
<2007-10-31 22:53:58 CET - > LOG:  redo done at 153/FEA05E40
<2007-10-31 22:53:58 CET - > LOG:  database system is ready

I hope that no data got corrupted. Is there any way to check this?

What is the conclusion of this experience? Is it contrary to the above
statements dangerous to kill (-9) a subprocess?

Regards,
    Christian

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer

pgsql-general by date:

From: Tom Lane
Date: 31 October 2007, 20:05:37
Subject: Re: strange message from pg_dumpall

From: Tom Lane
Date: 31 October 2007, 21:02:42
Subject: Re: (Never?) Kill Postmaster?

Re: (Never?) Kill Postmaster? - Mailing list pgsql-general

Previous

Next