Re: It happened again: Server hung up solid - Mailing list pgsql-hackers
From | The Hermit Hacker |
---|---|
Subject | Re: It happened again: Server hung up solid |
Date | |
Msg-id | Pine.BSF.4.21.0005072157060.87721-100000@thelab.hub.org Whole thread Raw |
In response to | Re: It happened again: Server hung up solid (The Hermit Hacker <scrappy@hub.org>) |
Responses |
Re: It happened again: Server hung up solid
|
List | pgsql-hackers |
kill -ABRT does nothing: pgsql% kill -ABRT 33683 pgsql% !ps ps ux USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres) pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.17 -su (tcsh) pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmas pgsql 34677 0.0 0.2 1408 1048 p2 S+ 8:50PM 0:00.08 -su (tcsh) pgsql 34696 0.0 0.0 396 232 p0 R+ 8:56PM 0:00.00 ps ux pgsql% !ps ps ux USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres) pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.17 -su (tcsh) pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmas pgsql 34677 0.0 0.2 1408 1048 p2 S+ 8:50PM 0:00.08 -su (tcsh) pgsql 34697 0.0 0.0 396 232 p0 R+ 8:56PM 0:00.00 ps ux On Sun, 7 May 2000, The Hermit Hacker wrote: > > > Okay, just happened again ... no postgres backend is being started: > > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND > pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres) > pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.16 -su (tcsh) > pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmaster -B 4096 -N 128 -S -o -F -o /pgsql/errout.5432 > pgsql 34677 0.0 0.2 1408 1048 p2 S 8:50PM 0:00.07 -su (tcsh) > pgsql 34685 0.0 0.2 1652 1032 p0 S+ 8:51PM 0:00.01 psql udmsearch > pgsql 34687 0.0 0.0 400 232 p2 R+ 8:51PM 0:00.00 ps ux > > Going to look at the connection tracing option now and see what I can come > up with ... > > > On Sun, 7 May 2000, Tom Lane wrote: > > > The Hermit Hacker <scrappy@hub.org> writes: > > > Okay, this is with code of ~May 4th ... a 'psql' connection to the > > > database hangs solid. > > > > Do you mean you can't make a connection at all? Is there any indication > > that the postmaster is lighting off a backend for you? Since you show > > a couple of zombie backends hanging around, it would seem like a good > > bet that the postmaster itself is wedged and not responding to events, > > but I'm not sure. > > > > > errout is dated: > > > > > pgsql% !ls > > > ls -lt > > > total 13324 > > > -rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432 > > > > > and the last few lines contain: > > > > > ERROR: parser: parse error at or near "vpti" > > > pq_recvbuf: unexpected EOF on client connection > > > pq_flush: send() failed: Broken pipe > > > pq_recvbuf: recv() failed: Connection reset by peer > > > pq_recvbuf: unexpected EOF on client connection > > > pq_recvbuf: unexpected EOF on client connection > > > pq_flush: send() failed: Broken pipe > > > pq_recvbuf: recv() failed: Connection reset by peer > > > > > But, of course, no date/time ... > > > > Given that the file mod time is considerably before the hang (right?) > > the messages in it are probably unrelated. It does seem odd that you > > have so many clients disconnecting ungracefully; what client apps are > > you running? > > > > > Since this is a production server, I can't just leave it there hung like > > > that, but if someone wants to give some instructions on what to do the > > > next time this happens, please feel free to do so, and I'll add that to my > > > list ... maybe run a gdb command on it, since truss doesn't appear to > > > help? > > > > Try killing the postmaster itself in such a way as to produce a coredump > > (kill -ABORT ought to do) and get a backtrace from that. It might also > > be worth running the postmaster with connection tracing turned on (I > > forget the incantation for that, but it should be in TFM). > > > > > At this time, I consider this to be a show-stopper on the release ... this > > > is what happened the last time when the result appeared to be the index > > > corruption > > > > If the postmaster is hanging then it's almost certainly unrelated to > > index corruption... > > > > regards, tom lane > > > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
pgsql-hackers by date: