Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) |
Date | |
Msg-id | 20130623184754.GA826248@tornado.leadboat.com Whole thread Raw |
In response to | Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks) ("MauMau" <maumau307@gmail.com>) |
Responses |
Re: backend hangs at immediate shutdown (Re: Back-branch
update releases coming in a couple weeks)
|
List | pgsql-hackers |
On Sun, Jun 23, 2013 at 01:55:19PM +0900, MauMau wrote: > From: "Robert Haas" <robertmhaas@gmail.com> >> On Fri, Jun 21, 2013 at 10:02 PM, MauMau <maumau307@gmail.com> wrote: >>> I'm comfortable with 5 seconds. We are talking about the interval >>> between >>> sending SIGQUIT to the children and then sending SIGKILL to them. In >>> most >>> situations, the backends should terminate immediately. However, as I >>> said a >>> few months ago, ereport() call in quickdie() can deadlock >>> indefinitely. This >>> is a PostgreSQL's bug. >> >> So let's fix that bug. Then we don't need this. [quoted part filtered to undo caps lock] > There are two ways to fix that bug. You are suggesting 1 of the > following two methods, aren't you? I suspect Robert meant to prevent the deadlock by limiting quickdie() to calling only async-signal-safe functions. Implementing that looked grotty when we last discussed it, though, and it would not help against a library or trusted PL function suspending SIGQUIT. > 1. (Robert-san's idea) > Upon receipt of immediate shutdown request, postmaster sends SIGKILL to > its children. > > 2. (Tom-san's idea) > Upon receipt of immediate shutdown request, postmaster first sends > SIGQUIT to its children, wait a while for them to terminate, then sends > SIGKILL to them. > > > At first I proposed 1. Then Tom san suggested 2 so that the server is as > friendly to the clients as now by notifying them of the server shutdown. > I was convinced by that idea and chose 2. > > Basically, I think both ideas are right. They can solve the original > problem. Agreed. > However, re-considering the meaning of "immediate" shutdown, I feel 1 is > a bit better, because trying to do something in quickdie() or somewhere > does not match the idea of "immediate". We may not have to be friendly to Perhaps so, but more important than the definition of the word "immediate" is what an immediate shutdown has long meant in PostgreSQL. All this time it has made some effort to send a message to the client. Note also that the patch under discussion affects both immediate shutdowns and the automatic reset in response to a backend crash. I think the latter is the more-important use case, and the message is nice to have there. > the clients at the immediate shutdown. The code gets much simpler. In > addition, it may be better that we similarly send SIGKILL in backend > crash (FatalError) case, eliminate the use of SIGQUIT and remove > quickdie() and other SIGQUIT handlers. My take is that the client message has some value, and we shouldn't just discard it to simplify the code slightly. Finishing the shutdown quickly has value, of course. The relative importance of those things should guide the choice of a timeout under method #2, but I don't see a rigorous way to draw the line. I feel five seconds is, if anything, too high. What about using deadlock_timeout? It constitutes a rough barometer on the system's tolerance for failure detection latency, and we already overload it by having it guide log_lock_waits. The default of 1s makes sense to me for this purpose, too. We can always add a separate immediate_shutdown_timeout if there's demand, but I don't expect such demand. (If we did add such a GUC, setting it to zero could be the way to request method 1. If some folks strongly desire method 1, that's probably the way to offer it.) Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: