Thread: more backtraces
In the previous discussions on backtrace support, some people asked for backtraces in more situations. Here is a patch that prints backtraces on SIGABRT, SIGBUS, and SIGSEGV signals. SIGABRT includes assertions and elog(PANIC). Do signals work like this on Windows? Do we need special EXEC_BACKEND support? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi, On 2019-12-04 20:45:25 +0100, Peter Eisentraut wrote: > In the previous discussions on backtrace support, some people asked for > backtraces in more situations. Here is a patch that prints backtraces on > SIGABRT, SIGBUS, and SIGSEGV signals. SIGABRT includes assertions and > elog(PANIC). Hm. Can we really do that somewhat reliably like this? I'd suspect that there'll be some oddities e.g. for stack overflows if done this way. To my knowledge it's not a good idea to intercept SIGBUS/SIGSEGV without using a separate signal stack (cf. sigaltstack) - but using a separate stack could also make it harder to determine a correct backtrace? It'd be bad if the addition of backtraces for SEGV/BUS suddenly made it harder to attach a debugger and getting useful results. Even disregarding the previous concerns, we'll get less useful debugger interactions due to this, e.g. for things like null pointer derefs, right? Doing this for SIGABRT seems like a more clearly good case - by that point we're already removed a few frames from the triggering code anyway. So debugging experience won't suffer much. And I don't think there's a corresponding issue with the stack potentially being corrupted / not large enough. - Andres
On 2019-12-04 20:59, Andres Freund wrote: > On 2019-12-04 20:45:25 +0100, Peter Eisentraut wrote: >> In the previous discussions on backtrace support, some people asked for >> backtraces in more situations. Here is a patch that prints backtraces on >> SIGABRT, SIGBUS, and SIGSEGV signals. SIGABRT includes assertions and >> elog(PANIC). > > Hm. Can we really do that somewhat reliably like this? I've seen reputable programs that do all kinds of things in SIGSEGV handlers, including running user-defined programs, without taking any special precautions. So it seems possible in general. > I'd suspect that > there'll be some oddities e.g. for stack overflows if done this way. To > my knowledge it's not a good idea to intercept SIGBUS/SIGSEGV without > using a separate signal stack (cf. sigaltstack) - but using a separate > stack could also make it harder to determine a correct backtrace? Didn't know about that, but seems useful. I'll look into it. > It'd be bad if the addition of backtraces for SEGV/BUS suddenly made it > harder to attach a debugger and getting useful results. Even > disregarding the previous concerns, we'll get less useful debugger > interactions due to this, e.g. for things like null pointer derefs, > right? The backtrace and level of detail jumping around between frames I get in lldb looks the same as without this. But it might depend. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andres Freund <andres@anarazel.de> writes: > It'd be bad if the addition of backtraces for SEGV/BUS suddenly made it > harder to attach a debugger and getting useful results. Yeah. TBH, I'm not sure I want this, at least not in debug builds. regards, tom lane
On 2019-12-04 22:34, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: >> It'd be bad if the addition of backtraces for SEGV/BUS suddenly made it >> harder to attach a debugger and getting useful results. > > Yeah. TBH, I'm not sure I want this, at least not in debug builds. I understand that the SEGV/BUS thing can be a bit scary. We can skip it. Are people interested in backtraces on abort()? That was asked for in an earlier thread. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Dec 13, 2019 at 7:26 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2019-12-04 22:34, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > >> It'd be bad if the addition of backtraces for SEGV/BUS suddenly made it > >> harder to attach a debugger and getting useful results. > > > > Yeah. TBH, I'm not sure I want this, at least not in debug builds. > > I understand that the SEGV/BUS thing can be a bit scary. We can skip it. > > Are people interested in backtraces on abort()? That was asked for in > an earlier thread. I mean, I think backtraces are great, and we should have more of them. It's possible that trying to do it in certain cases will cause problems, but we could back off those cases as we find them, or maybe try to work around them using sigaltstack(), or maybe back it off in debug builds. It would make life a lot easier for me if I never had to explain to a customer (1) how to install gdb or (2) that they needed to get $BOSS to approve installation of development tools on production systems. I would hate to see us shy away from improvements that might reduce the need for such conversations on the theory that bad stuff *might* happen. In my experience, the importance of having a stack trace in the log is greatest for a segmentation fault, because otherwise you have no indication whatsoever of where the problem happened. Having the query text has been a boon, but it's still not a lot to go on unless the same query crashes every time. In other situations, like a PANIC, Assertion failure, or (and this is a big one) non-descriptive error message (cache look failed for thingy %u) a backtrace is sometimes really helpful as well. You don't *always* need it, but you *often* need it. It is absolutely important that we don't break debuggability in the service of getting more stack traces. At the same time, there are a lot more PostgreSQL users out there than there are PostgreSQL developers, and a lot of those people are running non-cassert, non-debug builds. Being able to get debugging information from failures that happen on those installations that enables us to fix things without having to go through a time-consuming process of guesswork and attempted reproduction is really valuable. A stack trace can turn a lengthy nightmare into a quick fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Dec 13, 2019 at 7:26 AM Peter Eisentraut >> Are people interested in backtraces on abort()? That was asked for in >> an earlier thread. FWIW, I don't have too much of an opinion about abort() yet. Aren't we covering most of the possible cases for that already? I don't think that direct abort() calls are considered good style in the backend; it'd mostly get reached via Assert or PANIC. > It would make life a lot easier for me if I never had to explain to a > customer (1) how to install gdb or (2) that they needed to get $BOSS > to approve installation of development tools on production systems. Sure, but this facility is not going to have that end result, because the output just isn't detailed enough. If it were, I'd be more interested in taking risks to get the output. But as it stands, we're going to need more information in a large fraction of cases, so I'm dubious about doing anything that might actually interfere with collecting such information. > Being able to get debugging information from > failures that happen on those installations that enables us to fix > things without having to go through a time-consuming process of > guesswork and attempted reproduction is really valuable. A stack trace > can turn a lengthy nightmare into a quick fix. I think you are supposing that these traces will be as useful as gdb traces. They won't. In particular, where a gdb trace will almost always localize the problem to a line of C code, with these you're quite lucky if you can even localize to a specific function. That issue is mitigated for the existing use-cases by the fact that there's also a reported error message or assertion condition, so you can use that to narrow down the trap site. But that won't help for SIGSEGV. I think that the most useful next steps would involve trying to get better printouts from the cases this code already traps, rather than extending it to more cases. Maybe eventually we'll feel that this code is useful and reliable enough to justify trying to insert it into SIGSEGV cases; but we're not there today. regards, tom lane
On 2019-Dec-15, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > Being able to get debugging information from > > failures that happen on those installations that enables us to fix > > things without having to go through a time-consuming process of > > guesswork and attempted reproduction is really valuable. A stack trace > > can turn a lengthy nightmare into a quick fix. > > I think you are supposing that these traces will be as useful as gdb > traces. They won't. In particular, where a gdb trace will almost > always localize the problem to a line of C code, with these you're > quite lucky if you can even localize to a specific function. That's already been my experience :-( > I think that the most useful next steps would involve trying to get > better printouts from the cases this code already traps, +1 -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services