Restartable signals 'n all that - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Restartable signals 'n all that |
Date | |
Msg-id | 12819.1183306286@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Restartable signals 'n all that
Re: Restartable signals 'n all that Re: Restartable signals 'n all that Re: Restartable signals 'n all that |
List | pgsql-hackers |
While poking at the vacuum-launcher issue currently under discussion, I got annoyed again at the behavior we've known for a long while that on some platforms pg_usleep() won't be interrupted by signals. (In particular I see this on HPUX, though not on Linux or Darwin. Anyone know if it happens on any BSDen?) I noticed that with the launcher set to sleep at most one second between checks for signals, it seemed to *always* take the full second before shutting down, which seemed awfully unlucky. Some more testing and man-page-reading revealed the full truth of what's going on. The Single Unix Spec's select(2) page says under ERRORS [EINTR] The select() function was interrupted before any of the selected events occurred and before the timeout intervalexpired. If SA_RESTART has been set for the interrupting signal, it is implementation-dependent whether select()restarts or returns with [EINTR]. Since pqsignal() sets SA_RESTART for all trapped signals except SIGALRM, that means we are exposing ourselves to the implementation dependency. What I furthermore realized while tracing is that "restart" means "start counting down the full timeout interval over again". Thus, if we have told select() to time out after 1 second, and SIGINT arrives after 0.9 second, we will wait a full second more before responding. Bad as that is, it gets worse rapidly: each new signal arrival restarts the cycle. So a continuous flow of signals at a spacing of less than 1 second would prevent the delay from *ever* terminating. This may be why some kernels reduce the timeout value before returning, so that a "restart" behavior in userland behaves sanely. But that's not what's happening for me :-(. To me, this calls into question whether we should try to avoid using SA_RESTART at all. The reason for doing it of course is to avoid unexpected syscall EINTR failures as well as short read/short write behaviors during disk I/O. However, if that's the plan then what the heck is pqsignal() doing giving an exception for SIGALRM? As soon as you have even one non-restartable trapped signal, it seems you need to handle EINTR everywhere. I looked into the CVS history and found that we inherited the SIGALRM exception from Berkeley (in fact it's in the v4r2 sources from 1994). Back then the system's usage of SIGALRM was pretty darn narrow --- it was used only to trigger the deadlock checker, which means it applied only while waiting for a lock, and the range of code in which the interrupt could occur was just a few lines. Now that we use SIGALRM for statement_timeout, the interrupt can potentially happen almost anywhere in the backend code. So we've got two problems: SA_RESTART is preventing some EINTRs from happening when we'd like, and yet it seems we are at risk of unwanted EINTRs anyway. The only really clean solution I can see is to stop using SA_RESTART and try to make all our syscalls EINTR-proof. But the probability of bugs-of-omission seems just about 100%, especially in third party backend add-ons that we don't get to review the code for. If we do nothing, anyone using statement_timeout is at risk. The risk is somewhat ameliorated by the fact that occurrence of the interrupt means transaction cancellation anyway, so an unexpected error of some other type isn't really a fatal problem. But it's still a bit nervous-making. I don't currently see a way to get corrupted data from an EINTR (bufmgr is fairly robust about write failures, for instance) but ... If we decide to live with that, and fix any reported problems, then one thing we could do to ameliorate the sleep problem is to turn off SA_RESTART for all activity-cancelling interrupts, in particular SIGINT/SIGTERM/SIGQUIT. This wouldn't make it safe for bgwriter and friends to go back to long sleep intervals, because they are watching for other interrupts too that don't represent reasons to cancel transactions. But it would at least solve the problem of slow response to shutdown requests. Comments? I sure hope someone has a better idea. regards, tom lane
pgsql-hackers by date: