Re: unable to fail over to warm standby server - Mailing list pgsql-bugs
From | Heikki Linnakangas |
---|---|
Subject | Re: unable to fail over to warm standby server |
Date | |
Msg-id | 4B71BBCB.8070302@enterprisedb.com Whole thread Raw |
In response to | Re: unable to fail over to warm standby server (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: unable to fail over to warm standby server
Re: unable to fail over to warm standby server Re: unable to fail over to warm standby server |
List | pgsql-bugs |
Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Jan 29, 2010 at 3:32 PM, Heikki Linnakangas >>> That only affects the error message and is harmless otherwise, but I >>> thought I'd mention it. I'll fix it, unless someone wants to argue that >>> its more useful to print the raw return value of system(), because it >>> might contain more information like which signal killed the process, >>> that you could extract from the cryptic error code using e.g WTERMSIG() >>> macro. > >> An "if" statement would seem to be in order, so that you can print out >> either the exit code or the signal number as appropriate. > > Yes. Please see the existing code in the postmaster that prints > subprocess exit codes, and duplicate it (or perhaps refactor so you can > avoid code duplication; though translatability of messages might limit > what you can do there). Here's what I came up with. Translatability indeed makes it pretty hard, I ended up copy-pasting. BTW, I don't think I'm going to bother or risk back-patching this. It was harmless, and for forensic purposes all the information was there in the old message too, just in a cryptic format. And the new messages would need translating too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *************** *** 546,551 **** static void pg_start_backup_callback(int code, Datum arg); --- 546,552 ---- static bool read_backup_label(XLogRecPtr *checkPointLoc); static void rm_redo_error_callback(void *arg); static int get_sync_bit(int method); + static int errdetail_childexit(const char *procname, int exitstatus); /* *************** *** 2959,2966 **** RestoreArchivedFile(char *path, const char *xlogfname, signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125; ereport(signaled ? FATAL : DEBUG2, ! (errmsg("could not restore file \"%s\" from archive: return code %d", ! xlogfname, rc))); not_available: /* --- 2960,2967 ---- signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125; ereport(signaled ? FATAL : DEBUG2, ! (errmsg("could not restore file \"%s\" from archive", xlogfname), ! errdetail_childexit("restore_command", rc))); not_available: /* *************** *** 3077,3088 **** ExecuteRecoveryEndCommand(void) signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125; ereport(signaled ? FATAL : WARNING, ! (errmsg("recovery_end_command \"%s\": return code %d", ! xlogRecoveryEndCmd, rc))); } } /* * Preallocate log files beyond the specified log endpoint. * * XXX this is currently extremely conservative, since it forces only one --- 3078,3149 ---- signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125; ereport(signaled ? FATAL : WARNING, ! (errmsg("recovery end command \"%s\" failed", xlogRecoveryEndCmd), ! errdetail_childexit("recovery_end_command", rc))); } } /* + * Add detail information (and possibly a hint) describing why a child + * process exited to the error currently being constructed. This is for + * use inside ereport, e.g: + * + * ereport(DEBUG1, + * (errmsg(...), + * (errdetail_childexit("mycommand", rc)))); + * + * 'procname' is a name of the exited command, and 'exitstatus' is a return + * code as returned by e.g wait(2). + * + * The logic is copied from LogChildExit() in postmaster/postmaster.c. If + * you change this, make sure you keep LogChildExit in sync. + */ + static int + errdetail_childexit(const char *procname, int exitstatus) + { + if (WIFEXITED(exitstatus)) + /*------ + translator: %s is a noun phrase describing a child process, such as + "restore_command" */ + errdetail("%s exited with exit code %d", + procname, WEXITSTATUS(exitstatus)); + else if (WIFSIGNALED(exitstatus)) + { + #if defined(WIN32) + /*------ + translator: %s is a noun phrase describing a child process, such as + "restore_command" */ + errdetail("%s was terminated by exception 0x%X", + procname, WTERMSIG(exitstatus)); + errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."); + #elif defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST + /*------ + translator: %s is a noun phrase describing a child process, such as + "restore_command" */ + errdetail("%s was terminated by signal %d: %s", + procname, WTERMSIG(exitstatus), + WTERMSIG(exitstatus) < NSIG ? + sys_siglist[WTERMSIG(exitstatus)] : "(unknown)"); + #else + /*------ + translator: %s is a noun phrase describing a child process, such as + "restore_command" */ + errdetail("%s was terminated by signal %d", + procname, WTERMSIG(exitstatus)); + #endif + } + else + /*------ + translator: %s is a noun phrase describing a child process, such as + "restore_command" */ + errdetail("%s exited with unrecognized status %d", + procname, exitstatus); + + /* return dummy value, so that this can be used inside ereport(...) */ + return 1; + } + + /* * Preallocate log files beyond the specified log endpoint. * * XXX this is currently extremely conservative, since it forces only one *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *************** *** 2770,2775 **** HandleChildCrash(int pid, int exitstatus, const char *procname) --- 2770,2779 ---- /* * Log the death of a child process. + * + * errdetail_childexit() function in xlog.c uses the same logic for + * constructing the message. If you change this function, remember to + * keep errdetail_childexit() in sync. */ static void LogChildExit(int lev, const char *procname, int pid, int exitstatus)
pgsql-bugs by date: