Thread: all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)
all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)
From
Mark Aufflick
Date:
DEBUG: server process (pid 971) was terminated by signal 14 DEBUG: terminating any other active server processes seems to be happening every day or so. The server log doesn't indicate any problems more innocuous than the occasional unexpected EOF on client connections, a few 'adding missing FROM-clause' and a stack of name truncation log entries. The clients are AOLServer/OpenACS and a perl daemon that forks off a handful of children (which only access two tables). Both plpgsql and plperlu are used (plperlu is used for one trigger function to post a single https form that sends an sms message, and record the result body). I have trolled the mail list archives and the only similar deaths I have found are under cygwin, but I am running a fully up2date redhat 7.2 box (with custom compiled 7.2.3 from sources as of a month ago). Any ideas would be greatly appreciated! Cheers, Mark. -- Mark Aufflick e: mark@pumptheory.com w: www.pumptheory.com p: +61 438 700 647
Re: all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)
From
Tom Lane
Date:
Mark Aufflick <mark@pumptheory.com> writes: > DEBUG: server process (pid 971) was terminated by signal 14 Hm, that's SIGALRM on my box, I assume so on yours too. AFAICT, there is no part of the Postgres code that runs with SIGALRM set to default handling: it's either SIG_IGN or the deadlock timer handler. > Both plpgsql and plperlu are used (plperlu is used for one trigger > function to post a single https form that sends an sms message, and > record the result body). I wonder whether the Perl interpreter is hacking on the SIGALRM setting. That would be pretty unfriendly of it (but I don't think Perl quite believes the notion that it might be only a subroutine library, and not in full control of the process...) regards, tom lane
Re: all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)
From
Mark Aufflick
Date:
ok, so that's not it - i'm definitely not trapping SIGALRM (and btw, this was only in the perl client code, which I don;t see how that could cause the problem anyway - as opposed to in the plperlu function, which in any case I am pretty sure was not being called when the server crashed) the log entries are: DEBUG: server process (pid 20704) was terminated by signal 14 DEBUG: terminating any other active server processes NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. [repeated] FATAL 1: The database system is in recovery mode [repeated] DEBUG: all server processes terminated; reinitializing shared memory and semaphores DEBUG: database system was interrupted at 2003-01-29 02:23:14 EST DEBUG: checkpoint record is at 0/1267C284 DEBUG: redo record is at 0/1267C284; undo record is at 0/0; shutdown FALSE DEBUG: next transaction id: 823075; next oid: 134017 DEBUG: database system was not properly shut down; automatic recovery in progress FATAL 1: The database system is starting up [repeated] DEBUG: redo starts at 0/1267C2C4 DEBUG: ReadRecord: record with zero length at 0/126B3D80 DEBUG: redo done at 0/126B3D5C FATAL 1: The database system is starting up [repeated] DEBUG: database system is ready any ideas anyone? Mark. with the last NOTICE being repeated for each backend. On Tuesday, January 28, 2003, at 03:42 PM, Tom Lane wrote: > Mark Aufflick <mark@pumptheory.com> writes: >> DEBUG: server process (pid 971) was terminated by signal 14 > > Hm, that's SIGALRM on my box, I assume so on yours too. > > AFAICT, there is no part of the Postgres code that runs with SIGALRM > set to default handling: it's either SIG_IGN or the deadlock timer > handler. > >> Both plpgsql and plperlu are used (plperlu is used for one trigger >> function to post a single https form that sends an sms message, and >> record the result body). > > I wonder whether the Perl interpreter is hacking on the SIGALRM > setting. That would be pretty unfriendly of it (but I don't think > Perl quite believes the notion that it might be only a subroutine > library, and not in full control of the process...) > > regards, tom lane
Re: all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)
From
Tom Lane
Date:
Mark Aufflick <mark@pumptheory.com> writes: > ok, so that's not it - i'm definitely not trapping SIGALRM (and btw, > this was only in the perl client code, which I don;t see how that could > cause the problem anyway - as opposed to in the plperlu function, which > in any case I am pretty sure was not being called when the server > crashed) It wouldn't have to be executing when the crash occurred. If it had executed at some prior time, and reset the handling of signal 14 at that time, then you'd get this failure: > DEBUG: server process (pid 20704) was terminated by signal 14 whenever the backend process would next have reached a lock timeout. I have not dug through the Perl sources to look for mucking with SIGALRM, but I bet that's what the problem is. regards, tom lane
Re: all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)
From
Mark Aufflick
Date:
now that you made me stop and think, I am guessing that the Net::HTTP module must use SIGALRM for handling timeouts... failing finding a way to do without the plperlu trigger altogether, i guess i will have to save and restore the trap - could be messy. On Wednesday, January 29, 2003, at 02:41 AM, Tom Lane wrote: > Mark Aufflick <mark@pumptheory.com> writes: >> ok, so that's not it - i'm definitely not trapping SIGALRM (and btw, >> this was only in the perl client code, which I don;t see how that >> could >> cause the problem anyway - as opposed to in the plperlu function, >> which >> in any case I am pretty sure was not being called when the server >> crashed) > > It wouldn't have to be executing when the crash occurred. If it had > executed at some prior time, and reset the handling of signal 14 at > that > time, then you'd get this failure: > >> DEBUG: server process (pid 20704) was terminated by signal 14 > > whenever the backend process would next have reached a lock timeout. > > I have not dug through the Perl sources to look for mucking with > SIGALRM, but I bet that's what the problem is. > > regards, tom lane
Re: all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)
From
Vivek Khera
Date:
>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> DEBUG: server process (pid 20704) was terminated by signal 14 TL> whenever the backend process would next have reached a lock timeout. TL> I have not dug through the Perl sources to look for mucking with TL> SIGALRM, but I bet that's what the problem is. From what I recall, perl takes charge of all signals in order to deliver them at safe points to your perl program. This is both good and bad. mod_perl has to deal with perl taking signals from apache. Perhaps that code could be worth a read. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
Re: all backends (pg7.2.3 / redhat 7.2) die due to unexpected signal 14 (SIGALRM)
From
Mark Aufflick
Date:
Ahhh, yes, um, (looks to see if anyone noticed) that would be the: use sigtrap qw(die untrapped normal-signals stack-trace any error-signals); line in my code... i will get rid of the 'untrapped normal-signals' and report back. ta. On Tuesday, January 28, 2003, at 03:42 PM, Tom Lane wrote: > Mark Aufflick <mark@pumptheory.com> writes: >> DEBUG: server process (pid 971) was terminated by signal 14 > > Hm, that's SIGALRM on my box, I assume so on yours too. > > AFAICT, there is no part of the Postgres code that runs with SIGALRM > set to default handling: it's either SIG_IGN or the deadlock timer > handler. > >> Both plpgsql and plperlu are used (plperlu is used for one trigger >> function to post a single https form that sends an sms message, and >> record the result body). > > I wonder whether the Perl interpreter is hacking on the SIGALRM > setting. That would be pretty unfriendly of it (but I don't think > Perl quite believes the notion that it might be only a subroutine > library, and not in full control of the process...) > > regards, tom lane