elog(FATAL) vs shared memory - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | elog(FATAL) vs shared memory |
Date | |
Msg-id | 23609.1175709890@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: elog(FATAL) vs shared memory
Re: elog(FATAL) vs shared memory Re: elog(FATAL) vs shared memory |
List | pgsql-hackers |
In this thread: http://archives.postgresql.org/pgsql-bugs/2007-03/msg00145.php we eventually determined that the reported lockup had three components: (1) something (still not sure what --- Martin and Mark, I'd really like to know) was issuing random SIGTERMs to various postgres processes including autovacuum. (2) if a SIGTERM happens to arrive while btbulkdelete is running, the next CHECK_FOR_INTERRUPTS will do elog(FATAL), causing elog.c to do proc_exit(0), leaving the vacuum still recorded as active in the shared memory array maintained by _bt_start_vacuum/_bt_end_vacuum. The PG_TRY block in btbulkdelete doesn't get a chance to clean up. (3) eventually, either we try to re-vacuum the same index or accumulation of bogus active entries overflows the array. Either way, _bt_start_vacuum throws an error, which btbulkdelete PG_CATCHes, leading to_bt_end_vacuum trying to re-acquire the LWLock already taken by _bt_start_vacuum, meaning that the process hangs up. And then so does anything else that needs to take that LWLock... Point (3) is already fixed in CVS, but point (2) is a lot nastier. What it essentially says is that trying to clean up shared-memory state in a PG_TRY block is unsafe: you can't be certain you'll get to do it. Now this is not a big deal during normal SIGTERM or SIGQUIT database shutdown, because we're going to abandon the shared memory segment anyway. However, if we ever want to support individual session kill via SIGTERM, it's a problem. Even if we were not interested in someday considering that a supported feature, it seems that dealing with random SIGTERMs is needed for robustness in at least some environments. AFAICS, there are basically two ways we might try to approach this: Plan A: establish the rule that you mustn't try to clean up shared memory state in a PG_CATCH block. Anything you need to do like that has to be handled by an on_shmem_exit hook function, so it will be called during a FATAL exit. (Or maybe you can do it in PG_CATCH for normal ERROR cases, but you need a backing on_shmem_exit hook to clean up for FATAL.) Plan B: change the handling of FATAL errors so that they are thrown like normal errors, and the proc_exit call happens only when we get out to the outermost control level in postgres.c. This would mean that PG_CATCH blocks get a chance to clean up before the FATAL exit happens. The problem with that is that a non-cooperative PG_CATCH block might think it could "recover" from the error, and then the exit does not happen at all. We'd need a coding rule that PG_CATCH blocks *must* re-throw FATAL errors, which seems at least as ugly as Plan A. In particular, all three of the external-interpreter PLs are willing to return errors into the external interpreter, and AFAICS we'd be entirely at the mercy of the user-written Perl or Python or Tcl code whether it re-throws the error or not. So Plan B seems unacceptably fragile. Does anyone see a way to fix it, or perhaps a Plan C with a totally different idea? Plan A seems pretty ugly but it's the best I can come up with. regards, tom lane
pgsql-hackers by date: