Re: Archiver behavior at shutdown - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Archiver behavior at shutdown |
Date | |
Msg-id | 1198886684.9558.64.camel@ebony.site Whole thread Raw |
In response to | Re: Archiver behavior at shutdown (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Archiver behavior at shutdown
|
List | pgsql-hackers |
On Thu, 2007-12-27 at 18:54 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > On Thu, 2007-12-27 at 17:29 -0500, Tom Lane wrote: > >> Alvaro Herrera <alvherre@commandprompt.com> writes: > >>> then a subsequent postmaster start could initiate a second archiver > >>> process which would cause issues with whatever the first archiver is > >>> doing. > >> > >> That's a problem that the archiver itself should fix (perhaps it needs > >> its own lockfile). > > > http://archives.postgresql.org/pgsql-hackers/2006-05/msg00920.php > > I thought that sounded familiar ;-). As you say, I'm beginning to know where the bodies are buried... > What was the outcome of that > discussion? No patch for this ever got applied AFAICS. The patch > as posted had a few issues, per the thread, and I don't see a followup > version. (The alleged replacement patch did something else entirely.) We applied a one line change in preference to the lockfile approach for 8.2, requested by you, agreed to by me and applied by Bruce. This would be the behaviour I would have, if I had a blank canvas: - keep archiver alive at shutdown, rather than bouncing it - send SIGUSR2 to do finish-up and close, just like bgwriter - put a lockfile in for the archiver that prevents a new archiver from starting, but everything else comes up OK. In postmaster if PgArchPID == 0 then we check for archiver.pid, if present, read it and send a SIGUSR2 to it. If rc = ESRCH then process no present, so start up new archiver - lets keep archiving, if there is work to do, right up until the last possible moment, even if the postmaster has gone - ensure people understand that an archive_command call can be interrupted and may need to handle the consequences if the command is not atomic With those changes the use cases would look like this... System Shutdown System shuts down, postmaster shuts down, archiver works furiously until the end trying to archive things away. Archiver gets caught half way through copy, so crashes, leaving archiver.pid. Subsequent startup sees archiver.pid, postmaster reads file to get pid, then sends signal to archiver to see if it is still alive, it isn't so remove archiver.pid and allow next archiver to start. First call to archive_command handles partially copied file in archive. Server Crash Something takes down server, archiver stays up trying to archive things away. Crash recovery kicks in and finishes very quickly, new archiver tries to start up but cannot because first archiver is still working. At the end of its cycle, first archiver goes away and allows new archiver to start and continue operating. Server Restart Server shuts down, but there is work to do so first archiver stays around to finish it. Newly started server tries to start archiver but cannot because of pid file. Reads pid file, sends signal. Archiver is already shutting down, so continues its cycle and then quites. New archiver starts up under new postmaster. ...but that's too much change for me to personally stomach at this stage of 8.3. My main issue is that I don't have the time to be able to do a retest of start/stop/restart/crash behaviour and catching all the side cases is fairly hard, and yet also critical at this stage of play. For me, the behaviour is close enough now, with the main issue being the additional wait at the end of pgarch_MainLoop(). It's been there since 8.2, so a simple fix there would be non-invasive and backpatchable also. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
pgsql-hackers by date: