Re: (Never?) Kill Postmaster? - Mailing list pgsql-general
From | Christian Schröder |
---|---|
Subject | Re: (Never?) Kill Postmaster? |
Date | |
Msg-id | 47290119.50608@deriva.de Whole thread Raw |
In response to | Re: (Never?) Kill Postmaster? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: (Never?) Kill Postmaster?
|
List | pgsql-general |
Tom Lane wrote: > "Michael Harris" <michael.harris@ericsson.com> writes: > >> The tip is ''kill -9' the postmaster', which has two important >> differences to the scenario I just described: >> 1) kill -9 means the OS kills the process without allowing it to clean >> up after itself >> 2) The postmaster is the master postgresql backend process. If you want >> to kill a single query you would not want to kill that. >> > > Right: the tip is to not kill -9 the parent process; it's not saying > anything about child server processes. > > If you've got a child process that's unresponsive to SIGINT then you > can send it a SIGKILL instead; the downside is that this will force a > restart of the other children too, that is you're interrupting all > database sessions not only the one. But Postgres will recover > automatically and I don't think I've ever heard of anyone getting data > corruption as a result of such a thing. > I have been in exactly this situation today: One statement took several hours to complete, so it should be cancelled. I tried a "pg_cancel_backend" and a "kill -2" (which means "SIGINT" on our linux box), but nothing happened. Since I remembered this thread, I tried a "kill -9" on this child process. As you described, all other connections were reset, too, and this was the message in the server log: <2007-10-31 22:48:28 CET - chschroe> WARNING: terminating connection because of crash of another server process <2007-10-31 22:48:28 CET - chschroe> DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. But then, when I tried to reconnect to the database, I received the following message: <2007-10-31 22:50:01 CET - chschroe> FATAL: the database system is in recovery mode Ok, you wrote "Postgres will recover automatically", but could this take several minutes? Is that what "recovery mode" means? When nothing seemed to happen for several minutes, I performed a (fortunately clean) restart of the whole server. The log messages for the server restart looked normal to me: <2007-10-31 22:53:15 CET - > LOG: received smart shutdown request <2007-10-31 22:53:21 CET - > LOG: all server processes terminated; reinitializing <2007-10-31 22:53:58 CET - > LOG: database system was interrupted at 2007-10-31 22:46:46 CET <2007-10-31 22:53:58 CET - > LOG: checkpoint record is at 153/FE9FAF20 <2007-10-31 22:53:58 CET - > LOG: redo record is at 153/FE9FAF20; undo record is at 0/0; shutdown FALSE <2007-10-31 22:53:58 CET - > LOG: next transaction ID: 0/128715865; next OID: 58311787 <2007-10-31 22:53:58 CET - > LOG: next MultiXactId: 4704; next MultiXactOffset: 9414 <2007-10-31 22:53:58 CET - > LOG: database system was not properly shut down; automatic recovery in progress <2007-10-31 22:53:58 CET - > LOG: redo starts at 153/FE9FAF70 <2007-10-31 22:53:58 CET - > LOG: record with zero length at 153/FEA05E70 <2007-10-31 22:53:58 CET - > LOG: redo done at 153/FEA05E40 <2007-10-31 22:53:58 CET - > LOG: database system is ready I hope that no data got corrupted. Is there any way to check this? What is the conclusion of this experience? Is it contrary to the above statements dangerous to kill (-9) a subprocess? Regards, Christian -- Deriva GmbH Tel.: +49 551 489500-42 Financial IT and Consulting Fax: +49 551 489500-91 Hans-Böckler-Straße 2 http://www.deriva.de D-37079 Göttingen Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer
pgsql-general by date: