Re: Re: [COMMITTERS] pgsql: Start background writer during archive recovery. - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Re: [COMMITTERS] pgsql: Start background writer during archive recovery. |
Date | |
Msg-id | 499DC0EF.9040603@enterprisedb.com Whole thread Raw |
In response to | Re: Re: [COMMITTERS] pgsql: Start background writer during archive recovery. (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Re: [COMMITTERS] pgsql: Start background writer during archive recovery.
|
List | pgsql-hackers |
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> Tom Lane wrote: >>> Maybe the postmaster should wait for startup process exit before >>> deciding to open for business, instead of just a signal. > >> Not a bad idea. Although, there's nothing wrong with the current way >> either. The startup process does a proc_exit(0) right after sending the >> signal ATM, so there's no real work left at that point. > > The thing wrong with it is assuming that nothing interesting will happen > during proc_exit(). We hang enough stuff on on_proc_exit hooks that > that seems like a pretty shaky assumption. I can't get too worried, given that proc_exit() is a very well-beaten code path. Admittedly not so much for an auxiliary process, but that's just a dumbed down version of what happens with a full-blown backend. However I started looking into that idea anyway, and figured that it does simplify the logic in postmaster.c quite a bit, so I think it's worth doing on those grounds alone. Attached is a patch against CVS HEAD and also against a snapshot before the recovery infra patch, for easier reading. I'll give that some more testing and commit if I find no issues. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *************** *** 7695,7701 **** StartupProcessMain(void) BuildFlatFiles(false); /* Let postmaster know that startup is finished */ ! SendPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED); /* exit normally */ proc_exit(0); --- 7695,7701 ---- BuildFlatFiles(false); /* Let postmaster know that startup is finished */ ! SetPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED); /* exit normally */ proc_exit(0); *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *************** *** 227,240 **** static int Shutdown = NoShutdown; static bool FatalError = false; /* T if recovering from backend crash */ static bool RecoveryError = false; /* T if recovery failed */ - /* State of WAL redo */ - #define NoRecovery 0 - #define RecoveryStarted 1 - #define RecoveryConsistent 2 - #define RecoveryCompleted 3 - - static int RecoveryStatus = NoRecovery; - /* * We use a simple state machine to control startup, shutdown, and * crash recovery (which is rather like shutdown followed by startup). --- 227,232 ---- *************** *** 253,261 **** static int RecoveryStatus = NoRecovery; * point, if we had the infrastructure to do that. * * When the WAL redo is finished, the startup process signals us the third ! * time, and we switch to PM_RUN state. The startup process can also skip the ! * recovery and consistent recovery phases altogether, as it will during ! * normal startup when there's no recovery to be done, for example. * * Normal child backends can only be launched when we are in PM_RUN state. * (We also allow it in PM_WAIT_BACKUP state, but only for superusers.) --- 245,256 ---- * point, if we had the infrastructure to do that. * * When the WAL redo is finished, the startup process signals us the third ! * time, and exits. We don't process the 3d signal immediately but when we ! * see the that the startup process has exited, we check that we have ! * received the signal. If everything is OK, we then switch to PM_RUN state. ! * The startup process can also skip the recovery and consistent recovery ! * phases altogether, as it will during normal startup when there's no ! * recovery to be done, for example. * * Normal child backends can only be launched when we are in PM_RUN state. * (We also allow it in PM_WAIT_BACKUP state, but only for superusers.) *************** *** 338,344 **** static void pmdie(SIGNAL_ARGS); static void reaper(SIGNAL_ARGS); static void sigusr1_handler(SIGNAL_ARGS); static void dummy_handler(SIGNAL_ARGS); - static void CheckRecoverySignals(void); static void CleanupBackend(int pid, int exitstatus); static void HandleChildCrash(int pid, int exitstatus, const char *procname); static void LogChildExit(int lev, const char *procname, --- 333,338 ---- *************** *** 2019,2025 **** pmdie(SIGNAL_ARGS) ereport(LOG, (errmsg("received smart shutdown request"))); ! if (pmState == PM_RUN || pmState == PM_RECOVERY || pmState == PM_RECOVERY_CONSISTENT) { /* autovacuum workers are told to shut down immediately */ SignalAutovacWorkers(SIGTERM); --- 2013,2020 ---- ereport(LOG, (errmsg("received smart shutdown request"))); ! if (pmState == PM_RUN || pmState == PM_RECOVERY || ! pmState == PM_RECOVERY_CONSISTENT) { /* autovacuum workers are told to shut down immediately */ SignalAutovacWorkers(SIGTERM); *************** *** 2159,2181 **** reaper(SIGNAL_ARGS) */ if (pid == StartupPID) { StartupPID = 0; /* ! * Check if we've received a signal from the startup process ! * first. This can change pmState. If the startup process sends ! * a signal and exits immediately after that, we might not have ! * processed the signal yet. We need to know if it completed ! * recovery before it exited. */ ! CheckRecoverySignals(); /* * Unexpected exit of startup process (including FATAL exit) * during PM_STARTUP is treated as catastrophic. There is no ! * other processes running yet. */ ! if (pmState == PM_STARTUP) { LogChildExit(LOG, _("startup process"), pid, exitstatus); --- 2154,2177 ---- */ if (pid == StartupPID) { + bool recoveryCompleted; + StartupPID = 0; /* ! * Check if the startup process completed recovery before exiting */ ! if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED)) ! recoveryCompleted = true; ! else ! recoveryCompleted = false; /* * Unexpected exit of startup process (including FATAL exit) * during PM_STARTUP is treated as catastrophic. There is no ! * other processes running yet, so we can just exit. */ ! if (pmState == PM_STARTUP && !recoveryCompleted) { LogChildExit(LOG, _("startup process"), pid, exitstatus); *************** *** 2196,2212 **** reaper(SIGNAL_ARGS) continue; } /* * Startup process exited normally, but didn't finish recovery. * This can happen if someone else than postmaster kills the * startup process with SIGTERM. Treat it like a crash. */ ! if (pmState == PM_RECOVERY || pmState == PM_RECOVERY_CONSISTENT) { RecoveryError = true; HandleChildCrash(pid, exitstatus, _("startup process")); continue; } } /* --- 2192,2255 ---- continue; } /* + * Startup process exited in response to a shutdown request (or + * it finished normally regardless of the shutdown request). + */ + if (Shutdown > NoShutdown) + { + pmState = PM_WAIT_BACKENDS; + /* PostmasterStateMachine logic does the rest */ + continue; + } + /* * Startup process exited normally, but didn't finish recovery. * This can happen if someone else than postmaster kills the * startup process with SIGTERM. Treat it like a crash. */ ! if (!recoveryCompleted) { RecoveryError = true; HandleChildCrash(pid, exitstatus, _("startup process")); continue; } + + /* + * Startup succeeded, commence normal operations + */ + pmState = PM_RUN; + + /* + * Load the flat authorization file into postmaster's cache. The + * startup process has recomputed this from the database contents, + * so we wait till it finishes before loading it. + */ + load_role(); + + /* + * Crank up the background writer, if we didn't do that already + * when we entered consistent recovery phase. It doesn't matter + * if this fails, we'll just try again later. + */ + if (BgWriterPID == 0) + BgWriterPID = StartBackgroundWriter(); + + /* + * Likewise, start other special children as needed. In a restart + * situation, some of them may be alive already. + */ + if (WalWriterPID == 0) + WalWriterPID = StartWalWriter(); + if (AutoVacuumingActive() && AutoVacPID == 0) + AutoVacPID = StartAutoVacLauncher(); + if (XLogArchivingActive() && PgArchPID == 0) + PgArchPID = pgarch_start(); + if (PgStatPID == 0) + PgStatPID = pgstat_start(); + + /* at this point we are really open for business */ + ereport(LOG, + (errmsg("database system is ready to accept connections"))); } /* *************** *** 2622,2748 **** LogChildExit(int lev, const char *procname, int pid, int exitstatus) static void PostmasterStateMachine(void) { - /* Startup states */ - - if (pmState == PM_STARTUP && RecoveryStatus > NoRecovery) - { - /* WAL redo has started. We're out of reinitialization. */ - FatalError = false; - - /* - * Go to shutdown mode if a shutdown request was pending. - */ - if (Shutdown > NoShutdown) - { - pmState = PM_WAIT_BACKENDS; - /* PostmasterStateMachine logic does the rest */ - } - else - { - /* - * Crank up the background writer. It doesn't matter if this - * fails, we'll just try again later. - */ - Assert(BgWriterPID == 0); - BgWriterPID = StartBackgroundWriter(); - - pmState = PM_RECOVERY; - } - } - if (pmState == PM_RECOVERY && RecoveryStatus >= RecoveryConsistent) - { - /* - * Go to shutdown mode if a shutdown request was pending. - */ - if (Shutdown > NoShutdown) - { - pmState = PM_WAIT_BACKENDS; - /* PostmasterStateMachine logic does the rest */ - } - else - { - /* - * Startup process has entered recovery. We consider that good - * enough to reset FatalError. - */ - pmState = PM_RECOVERY_CONSISTENT; - - /* - * Load the flat authorization file into postmaster's cache. The - * startup process won't have recomputed this from the database yet, - * so we it may change following recovery. - */ - load_role(); - - /* - * Likewise, start other special children as needed. - */ - Assert(PgStatPID == 0); - PgStatPID = pgstat_start(); - - /* XXX at this point we could accept read-only connections */ - ereport(DEBUG1, - (errmsg("database system is in consistent recovery mode"))); - } - } - if ((pmState == PM_RECOVERY || - pmState == PM_RECOVERY_CONSISTENT || - pmState == PM_STARTUP) && - RecoveryStatus == RecoveryCompleted) - { - /* - * Startup succeeded. - * - * Go to shutdown mode if a shutdown request was pending. - */ - if (Shutdown > NoShutdown) - { - pmState = PM_WAIT_BACKENDS; - /* PostmasterStateMachine logic does the rest */ - } - else - { - /* - * Otherwise, commence normal operations. - */ - pmState = PM_RUN; - - /* - * Load the flat authorization file into postmaster's cache. The - * startup process has recomputed this from the database contents, - * so we wait till it finishes before loading it. - */ - load_role(); - - /* - * Crank up the background writer, if we didn't do that already - * when we entered consistent recovery phase. It doesn't matter - * if this fails, we'll just try again later. - */ - if (BgWriterPID == 0) - BgWriterPID = StartBackgroundWriter(); - - /* - * Likewise, start other special children as needed. In a restart - * situation, some of them may be alive already. - */ - if (WalWriterPID == 0) - WalWriterPID = StartWalWriter(); - if (AutoVacuumingActive() && AutoVacPID == 0) - AutoVacPID = StartAutoVacLauncher(); - if (XLogArchivingActive() && PgArchPID == 0) - PgArchPID = pgarch_start(); - if (PgStatPID == 0) - PgStatPID = pgstat_start(); - - /* at this point we are really open for business */ - ereport(LOG, - (errmsg("database system is ready to accept connections"))); - } - } - - /* Shutdown states */ - if (pmState == PM_WAIT_BACKUP) { /* --- 2665,2670 ---- *************** *** 2904,2911 **** PostmasterStateMachine(void) shmem_exit(1); reset_shared(PostPortNumber); - RecoveryStatus = NoRecovery; - StartupPID = StartupDataBase(); Assert(StartupPID != 0); pmState = PM_STARTUP; --- 2826,2831 ---- *************** *** 4010,4056 **** ExitPostmaster(int status) } /* ! * common code used in sigusr1_handler() and reaper() to handle ! * recovery-related signals from startup process */ static void ! CheckRecoverySignals(void) { ! bool changed = false; ! if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED)) ! { ! Assert(pmState == PM_STARTUP); ! RecoveryStatus = RecoveryStarted; ! changed = true; ! } ! if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_CONSISTENT)) { ! RecoveryStatus = RecoveryConsistent; ! changed = true; } ! if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED)) { ! RecoveryStatus = RecoveryCompleted; ! changed = true; ! } ! ! if (changed) ! PostmasterStateMachine(); ! } ! /* ! * sigusr1_handler - handle signal conditions from child processes ! */ ! static void ! sigusr1_handler(SIGNAL_ARGS) ! { ! int save_errno = errno; ! PG_SETMASK(&BlockSig); ! CheckRecoverySignals(); if (CheckPostmasterSignal(PMSIGNAL_PASSWORD_CHANGE)) { --- 3930,3987 ---- } /* ! * sigusr1_handler - handle signal conditions from child processes */ static void ! sigusr1_handler(SIGNAL_ARGS) { ! int save_errno = errno; ! PG_SETMASK(&BlockSig); ! /* ! * RECOVERY_STARTED and RECOVERY_CONSISTENT signals are ignored in ! * unexpected states. If the startup process quickly starts up, completes ! * recovery, exits, we might process the death of the startup process ! * first. We don't want to go back to recovery in that case. ! */ ! if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) && ! pmState == PM_STARTUP) { ! /* WAL redo has started. We're out of reinitialization. */ ! FatalError = false; ! ! /* ! * Crank up the background writer. It doesn't matter if this ! * fails, we'll just try again later. ! */ ! Assert(BgWriterPID == 0); ! BgWriterPID = StartBackgroundWriter(); ! ! pmState = PM_RECOVERY; } ! if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_CONSISTENT) && ! pmState == PM_RECOVERY) { ! /* ! * Load the flat authorization file into postmaster's cache. The ! * startup process won't have recomputed this from the database yet, ! * so we it may change following recovery. ! */ ! load_role(); ! /* ! * Likewise, start other special children as needed. ! */ ! Assert(PgStatPID == 0); ! PgStatPID = pgstat_start(); ! /* XXX at this point we could accept read-only connections */ ! ereport(DEBUG1, ! (errmsg("database system is in consistent recovery mode"))); ! pmState = PM_RECOVERY_CONSISTENT; ! } if (CheckPostmasterSignal(PMSIGNAL_PASSWORD_CHANGE)) { *** a/src/backend/storage/ipc/ipc.c --- b/src/backend/storage/ipc/ipc.c *************** *** 76,81 **** static int on_proc_exit_index, --- 76,83 ---- void proc_exit(int code) { + elog(LOG, "proc_exit: %d", on_proc_exit_index); + /* * Once we set this flag, we are committed to exit. Any ereport() will * NOT send control back to the main loop, but right back here. *************** *** 95,102 **** proc_exit(int code) InterruptHoldoffCount = 1; CritSectionCount = 0; - elog(DEBUG3, "proc_exit(%d)", code); - /* do our shared memory exits first */ shmem_exit(code); --- 97,102 ---- *** a/src/backend/storage/ipc/pmsignal.c --- b/src/backend/storage/ipc/pmsignal.c *************** *** 72,77 **** SendPostmasterSignal(PMSignalReason reason) --- 72,94 ---- } /* + * SetPostmasterSignal - like SendPostmasterSignal, but don't wake up + * postmaster + * + * This is for signals that the postmaster polls with CheckPostmasterSignal() + * but isn't interested in processing immediately. + */ + void + SetPostmasterSignal(PMSignalReason reason) + { + /* If called in a standalone backend, do nothing */ + if (!IsUnderPostmaster) + return; + /* Atomically set the proper flag */ + PMSignalFlags[reason] = true; + } + + /* * CheckPostmasterSignal - check to see if a particular reason has been * signaled, and clear the signal flag. Should be called by postmaster * after receiving SIGUSR1. *** a/src/include/storage/pmsignal.h --- b/src/include/storage/pmsignal.h *************** *** 39,44 **** typedef enum --- 39,45 ---- */ extern void PMSignalInit(void); extern void SendPostmasterSignal(PMSignalReason reason); + extern void SetPostmasterSignal(PMSignalReason reason); extern bool CheckPostmasterSignal(PMSignalReason reason); extern bool PostmasterIsAlive(bool amDirectChild); *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *************** *** 225,235 **** static pid_t StartupPID = 0, --- 225,257 ---- static int Shutdown = NoShutdown; static bool FatalError = false; /* T if recovering from backend crash */ + static bool RecoveryError = false; /* T if recovery failed */ /* * We use a simple state machine to control startup, shutdown, and * crash recovery (which is rather like shutdown followed by startup). * + * After doing all the postmaster initialization work, we enter PM_STARTUP + * state and the startup process is launched. The startup process begins by + * reading the control file and other preliminary initialization steps. When + * it's ready to start WAL redo, it signals postmaster, and we switch to + * PM_RECOVERY phase. The background writer is launched, while the startup + * process continues applying WAL. + * + * After reaching a consistent point in WAL redo, startup process signals + * us again, and we switch to PM_RECOVERY_CONSISTENT phase. There's currently + * no difference between PM_RECOVERY and PM_RECOVERY_CONSISTENT, but we + * could start accepting connections to perform read-only queries at this + * point, if we had the infrastructure to do that. + * + * When the WAL redo is finished, the startup process signals us the third + * time, and exits. We don't process the 3d signal immediately but when we + * see the that the startup process has exited, we check that we have + * received the signal. If everything is OK, we then switch to PM_RUN state. + * The startup process can also skip the recovery and consistent recovery + * phases altogether, as it will during normal startup when there's no + * recovery to be done, for example. + * * Normal child backends can only be launched when we are in PM_RUN state. * (We also allow it in PM_WAIT_BACKUP state, but only for superusers.) * In other states we handle connection requests by launching "dead_end" *************** *** 245,259 **** static bool FatalError = false; /* T if recovering from backend crash */ * * Notice that this state variable does not distinguish *why* we entered * states later than PM_RUN --- Shutdown and FatalError must be consulted ! * to find that out. FatalError is never true in PM_RUN state, nor in ! * PM_SHUTDOWN states (because we don't enter those states when trying to ! * recover from a crash). It can be true in PM_STARTUP state, because we ! * don't clear it until we've successfully recovered. */ typedef enum { PM_INIT, /* postmaster starting */ PM_STARTUP, /* waiting for startup subprocess */ PM_RUN, /* normal "database is alive" state */ PM_WAIT_BACKUP, /* waiting for online backup mode to end */ PM_WAIT_BACKENDS, /* waiting for live backends to exit */ --- 267,285 ---- * * Notice that this state variable does not distinguish *why* we entered * states later than PM_RUN --- Shutdown and FatalError must be consulted ! * to find that out. FatalError is never true in PM_RECOVERY_* or PM_RUN ! * states, nor in PM_SHUTDOWN states (because we don't enter those states ! * when trying to recover from a crash). It can be true in PM_STARTUP state, ! * because we don't clear it until we've successfully started WAL redo. ! * Similarly, RecoveryError means that we have crashed during recovery, and ! * should not try to restart. */ typedef enum { PM_INIT, /* postmaster starting */ PM_STARTUP, /* waiting for startup subprocess */ + PM_RECOVERY, /* in recovery mode */ + PM_RECOVERY_CONSISTENT, /* consistent recovery mode */ PM_RUN, /* normal "database is alive" state */ PM_WAIT_BACKUP, /* waiting for online backup mode to end */ PM_WAIT_BACKENDS, /* waiting for live backends to exit */ *************** *** 1302,1308 **** ServerLoop(void) * state that prevents it, start one. It doesn't matter if this * fails, we'll just try again later. */ ! if (BgWriterPID == 0 && pmState == PM_RUN) BgWriterPID = StartBackgroundWriter(); /* --- 1328,1336 ---- * state that prevents it, start one. It doesn't matter if this * fails, we'll just try again later. */ ! if (BgWriterPID == 0 && ! (pmState == PM_RUN || pmState == PM_RECOVERY || ! pmState == PM_RECOVERY_CONSISTENT)) BgWriterPID = StartBackgroundWriter(); /* *************** *** 1752,1758 **** canAcceptConnections(void) return CAC_WAITBACKUP; /* allow superusers only */ if (Shutdown > NoShutdown) return CAC_SHUTDOWN; /* shutdown is pending */ ! if (pmState == PM_STARTUP && !FatalError) return CAC_STARTUP; /* normal startup */ return CAC_RECOVERY; /* else must be crash recovery */ } --- 1780,1789 ---- return CAC_WAITBACKUP; /* allow superusers only */ if (Shutdown > NoShutdown) return CAC_SHUTDOWN; /* shutdown is pending */ ! if (!FatalError && ! (pmState == PM_STARTUP || ! pmState == PM_RECOVERY || ! pmState == PM_RECOVERY_CONSISTENT)) return CAC_STARTUP; /* normal startup */ return CAC_RECOVERY; /* else must be crash recovery */ } *************** *** 1982,1988 **** pmdie(SIGNAL_ARGS) ereport(LOG, (errmsg("received smart shutdown request"))); ! if (pmState == PM_RUN) { /* autovacuum workers are told to shut down immediately */ SignalAutovacWorkers(SIGTERM); --- 2013,2020 ---- ereport(LOG, (errmsg("received smart shutdown request"))); ! if (pmState == PM_RUN || pmState == PM_RECOVERY || ! pmState == PM_RECOVERY_CONSISTENT) { /* autovacuum workers are told to shut down immediately */ SignalAutovacWorkers(SIGTERM); *************** *** 2019,2025 **** pmdie(SIGNAL_ARGS) if (StartupPID != 0) signal_child(StartupPID, SIGTERM); ! if (pmState == PM_RUN || pmState == PM_WAIT_BACKUP) { ereport(LOG, (errmsg("aborting any active transactions"))); --- 2051,2064 ---- if (StartupPID != 0) signal_child(StartupPID, SIGTERM); ! if (pmState == PM_RECOVERY) ! { ! /* only bgwriter is active in this state */ ! pmState = PM_WAIT_BACKENDS; ! } ! if (pmState == PM_RUN || ! pmState == PM_WAIT_BACKUP || ! pmState == PM_RECOVERY_CONSISTENT) { ereport(LOG, (errmsg("aborting any active transactions"))); *************** *** 2115,2125 **** reaper(SIGNAL_ARGS) */ if (pid == StartupPID) { StartupPID = 0; - Assert(pmState == PM_STARTUP); ! /* FATAL exit of startup is treated as catastrophic */ ! if (!EXIT_STATUS_0(exitstatus)) { LogChildExit(LOG, _("startup process"), pid, exitstatus); --- 2154,2177 ---- */ if (pid == StartupPID) { + bool recoveryCompleted; + StartupPID = 0; ! /* ! * Check if the startup process completed recovery before exiting ! */ ! if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_COMPLETED)) ! recoveryCompleted = true; ! else ! recoveryCompleted = false; ! ! /* ! * Unexpected exit of startup process (including FATAL exit) ! * during PM_STARTUP is treated as catastrophic. There is no ! * other processes running yet, so we can just exit. ! */ ! if (pmState == PM_STARTUP && !recoveryCompleted) { LogChildExit(LOG, _("startup process"), pid, exitstatus); *************** *** 2127,2141 **** reaper(SIGNAL_ARGS) (errmsg("aborting startup due to startup process failure"))); ExitPostmaster(1); } - /* ! * Startup succeeded - we are done with system startup or ! * recovery. */ ! FatalError = false; ! /* ! * Go to shutdown mode if a shutdown request was pending. */ if (Shutdown > NoShutdown) { --- 2179,2199 ---- (errmsg("aborting startup due to startup process failure"))); ExitPostmaster(1); } /* ! * Any unexpected exit (including FATAL exit) of the startup ! * process is treated as a crash, except that we don't want ! * to reinitialize. */ ! if (!EXIT_STATUS_0(exitstatus)) ! { ! RecoveryError = true; ! HandleChildCrash(pid, exitstatus, ! _("startup process")); ! continue; ! } /* ! * Startup process exited in response to a shutdown request (or ! * it finished normally regardless of the shutdown request). */ if (Shutdown > NoShutdown) { *************** *** 2143,2151 **** reaper(SIGNAL_ARGS) /* PostmasterStateMachine logic does the rest */ continue; } /* ! * Otherwise, commence normal operations. */ pmState = PM_RUN; --- 2201,2221 ---- /* PostmasterStateMachine logic does the rest */ continue; } + /* + * Startup process exited normally, but didn't finish recovery. + * This can happen if someone else than postmaster kills the + * startup process with SIGTERM. Treat it like a crash. + */ + if (!recoveryCompleted) + { + RecoveryError = true; + HandleChildCrash(pid, exitstatus, + _("startup process")); + continue; + } /* ! * Startup succeeded, commence normal operations */ pmState = PM_RUN; *************** *** 2157,2167 **** reaper(SIGNAL_ARGS) load_role(); /* ! * Crank up the background writer. It doesn't matter if this ! * fails, we'll just try again later. */ ! Assert(BgWriterPID == 0); ! BgWriterPID = StartBackgroundWriter(); /* * Likewise, start other special children as needed. In a restart --- 2227,2238 ---- load_role(); /* ! * Crank up the background writer, if we didn't do that already ! * when we entered consistent recovery phase. It doesn't matter ! * if this fails, we'll just try again later. */ ! if (BgWriterPID == 0) ! BgWriterPID = StartBackgroundWriter(); /* * Likewise, start other special children as needed. In a restart *************** *** 2178,2186 **** reaper(SIGNAL_ARGS) /* at this point we are really open for business */ ereport(LOG, ! (errmsg("database system is ready to accept connections"))); ! ! continue; } /* --- 2249,2255 ---- /* at this point we are really open for business */ ereport(LOG, ! (errmsg("database system is ready to accept connections"))); } /* *************** *** 2443,2448 **** HandleChildCrash(int pid, int exitstatus, const char *procname) --- 2512,2529 ---- } } + /* Take care of the startup process too */ + if (pid == StartupPID) + StartupPID = 0; + else if (StartupPID != 0 && !FatalError) + { + ereport(DEBUG2, + (errmsg_internal("sending %s to process %d", + (SendStop ? "SIGSTOP" : "SIGQUIT"), + (int) StartupPID))); + signal_child(BgWriterPID, (SendStop ? SIGSTOP : SIGQUIT)); + } + /* Take care of the bgwriter too */ if (pid == BgWriterPID) BgWriterPID = 0; *************** *** 2514,2520 **** HandleChildCrash(int pid, int exitstatus, const char *procname) FatalError = true; /* We now transit into a state of waiting for children to die */ ! if (pmState == PM_RUN || pmState == PM_WAIT_BACKUP || pmState == PM_SHUTDOWN) pmState = PM_WAIT_BACKENDS; --- 2595,2603 ---- FatalError = true; /* We now transit into a state of waiting for children to die */ ! if (pmState == PM_RECOVERY || ! pmState == PM_RECOVERY_CONSISTENT || ! pmState == PM_RUN || pmState == PM_WAIT_BACKUP || pmState == PM_SHUTDOWN) pmState = PM_WAIT_BACKENDS; *************** *** 2723,2728 **** PostmasterStateMachine(void) --- 2806,2820 ---- } /* + * If recovery failed, wait for all non-syslogger children to exit, + * and then exit postmaster. We don't try to reinitialize when recovery + * fails, because more than likely it will just fail again and we will + * keep trying forever. + */ + if (RecoveryError && pmState == PM_NO_CHILDREN) + ExitPostmaster(1); + + /* * If we need to recover from a crash, wait for all non-syslogger * children to exit, then reset shmem and StartupDataBase. */ *************** *** 3847,3852 **** sigusr1_handler(SIGNAL_ARGS) --- 3939,3988 ---- PG_SETMASK(&BlockSig); + /* + * RECOVERY_STARTED and RECOVERY_CONSISTENT signals are ignored in + * unexpected states. If the startup process quickly starts up, completes + * recovery, exits, we might process the death of the startup process + * first. We don't want to go back to recovery in that case. + */ + if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) && + pmState == PM_STARTUP) + { + /* WAL redo has started. We're out of reinitialization. */ + FatalError = false; + + /* + * Crank up the background writer. It doesn't matter if this + * fails, we'll just try again later. + */ + Assert(BgWriterPID == 0); + BgWriterPID = StartBackgroundWriter(); + + pmState = PM_RECOVERY; + } + if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_CONSISTENT) && + pmState == PM_RECOVERY) + { + /* + * Load the flat authorization file into postmaster's cache. The + * startup process won't have recomputed this from the database yet, + * so we it may change following recovery. + */ + load_role(); + + /* + * Likewise, start other special children as needed. + */ + Assert(PgStatPID == 0); + PgStatPID = pgstat_start(); + + /* XXX at this point we could accept read-only connections */ + ereport(DEBUG1, + (errmsg("database system is in consistent recovery mode"))); + + pmState = PM_RECOVERY_CONSISTENT; + } + if (CheckPostmasterSignal(PMSIGNAL_PASSWORD_CHANGE)) { /*
pgsql-hackers by date: