Thread: It happened again: Server hung up solid
Okay, this is with code of ~May 4th ... a 'psql' connection to the database hangs solid. errout is dated: pgsql% !ls ls -lt total 13324 -rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432 and the last few lines contain: ERROR: parser: parse error at or near "vpti" pq_recvbuf: unexpected EOF on client connection pq_flush: send() failed: Broken pipe pq_recvbuf: recv() failed: Connection reset by peer pq_recvbuf: unexpected EOF on client connection pq_recvbuf: unexpected EOF on client connection pq_flush: send() failed: Broken pipe pq_recvbuf: recv() failed: Connection reset by peer But, of course, no date/time ... ps shows: USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND pgsql 33515 0.0 0.0 0 0 ?? Z 4:45PM 0:00.00 (postgres) pgsql 33516 0.0 0.0 0 0 ?? Z 4:45PM 0:00.00 (postgres) pgsql 93757 0.0 0.2 1456 1088 p0 S Wed03PM 0:01.11 -su (tcsh) pgsql 7100 0.0 0.5 38692 2616 ?? Is Fri12AM 8:43.44 /pgsql/bin/postmas pgsql 33667 0.0 0.0 396 224 p0 R+ 7:35PM 0:00.00 ps ux and postmaster is started with: pgsql% cat pgstart #!/bin/tcsh setenv PORT 5432 setenv POSTMASTER /pgsql/bin/postmaster unlimit ${POSTMASTER} -B 4096 -N 128 -S -o "-F -o /pgsql/errout.${PORT} -S 32768" \ -i -p ${PORT} -D/pgsql/data The machine is a Dual PIII with 512Meg of RAM, running FreeBSD 4.0-STABLE from April 22nd ... pgsql% truss -p 7100 Shows zilch ... Since this is a production server, I can't just leave it there hung like that, but if someone wants to give some instructions on what to do the next time this happens, please feel free to do so, and I'll add that to my list ... maybe run a gdb command on it, since truss doesn't appear to help? At this time, I consider this to be a show-stopper on the release ... this is what happened the last time when the result appeared to be the index corruption ... this time, I've checked a VACUUM after re-starting and it doesn't appear to be a problem, but they might not have been related, just a fluke ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker <scrappy@hub.org> writes: > Okay, this is with code of ~May 4th ... a 'psql' connection to the > database hangs solid. Do you mean you can't make a connection at all? Is there any indication that the postmaster is lighting off a backend for you? Since you show a couple of zombie backends hanging around, it would seem like a good bet that the postmaster itself is wedged and not responding to events, but I'm not sure. > errout is dated: > pgsql% !ls > ls -lt > total 13324 > -rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432 > and the last few lines contain: > ERROR: parser: parse error at or near "vpti" > pq_recvbuf: unexpected EOF on client connection > pq_flush: send() failed: Broken pipe > pq_recvbuf: recv() failed: Connection reset by peer > pq_recvbuf: unexpected EOF on client connection > pq_recvbuf: unexpected EOF on client connection > pq_flush: send() failed: Broken pipe > pq_recvbuf: recv() failed: Connection reset by peer > But, of course, no date/time ... Given that the file mod time is considerably before the hang (right?) the messages in it are probably unrelated. It does seem odd that you have so many clients disconnecting ungracefully; what client apps are you running? > Since this is a production server, I can't just leave it there hung like > that, but if someone wants to give some instructions on what to do the > next time this happens, please feel free to do so, and I'll add that to my > list ... maybe run a gdb command on it, since truss doesn't appear to > help? Try killing the postmaster itself in such a way as to produce a coredump (kill -ABORT ought to do) and get a backtrace from that. It might also be worth running the postmaster with connection tracing turned on (I forget the incantation for that, but it should be in TFM). > At this time, I consider this to be a show-stopper on the release ... this > is what happened the last time when the result appeared to be the index > corruption If the postmaster is hanging then it's almost certainly unrelated to index corruption... regards, tom lane
On Sun, 7 May 2000, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > > Okay, this is with code of ~May 4th ... a 'psql' connection to the > > database hangs solid. > > Do you mean you can't make a connection at all? Is there any indication > that the postmaster is lighting off a backend for you? Since you show > a couple of zombie backends hanging around, it would seem like a good > bet that the postmaster itself is wedged and not responding to events, > but I'm not sure. This appears to be the case, but next time it happens I will make double-sure of that ... considering that it was ~7pm at night when I tried, my initial guess is that nothing is going through postmaster at the time of hte hang ... > > Given that the file mod time is considerably before the hang (right?) > the messages in it are probably unrelated. It does seem odd that you > have so many clients disconnecting ungracefully; what client apps are > you running? alot of dbi stuff, the search engine for udmsearch, some php ... the server is currently serving ~12 databases for various clients ... > Try killing the postmaster itself in such a way as to produce a coredump > (kill -ABORT ought to do) and get a backtrace from that. It might also > be worth running the postmaster with connection tracing turned on (I > forget the incantation for that, but it should be in TFM). Will look at that one ...
Okay, just happened again ... no postgres backend is being started: USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres) pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.16 -su (tcsh) pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmaster -B 4096 -N 128 -S -o -F -o /pgsql/errout.5432 pgsql 34677 0.0 0.2 1408 1048 p2 S 8:50PM 0:00.07 -su (tcsh) pgsql 34685 0.0 0.2 1652 1032 p0 S+ 8:51PM 0:00.01 psql udmsearch pgsql 34687 0.0 0.0 400 232 p2 R+ 8:51PM 0:00.00 ps ux Going to look at the connection tracing option now and see what I can come up with ... On Sun, 7 May 2000, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > > Okay, this is with code of ~May 4th ... a 'psql' connection to the > > database hangs solid. > > Do you mean you can't make a connection at all? Is there any indication > that the postmaster is lighting off a backend for you? Since you show > a couple of zombie backends hanging around, it would seem like a good > bet that the postmaster itself is wedged and not responding to events, > but I'm not sure. > > > errout is dated: > > > pgsql% !ls > > ls -lt > > total 13324 > > -rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432 > > > and the last few lines contain: > > > ERROR: parser: parse error at or near "vpti" > > pq_recvbuf: unexpected EOF on client connection > > pq_flush: send() failed: Broken pipe > > pq_recvbuf: recv() failed: Connection reset by peer > > pq_recvbuf: unexpected EOF on client connection > > pq_recvbuf: unexpected EOF on client connection > > pq_flush: send() failed: Broken pipe > > pq_recvbuf: recv() failed: Connection reset by peer > > > But, of course, no date/time ... > > Given that the file mod time is considerably before the hang (right?) > the messages in it are probably unrelated. It does seem odd that you > have so many clients disconnecting ungracefully; what client apps are > you running? > > > Since this is a production server, I can't just leave it there hung like > > that, but if someone wants to give some instructions on what to do the > > next time this happens, please feel free to do so, and I'll add that to my > > list ... maybe run a gdb command on it, since truss doesn't appear to > > help? > > Try killing the postmaster itself in such a way as to produce a coredump > (kill -ABORT ought to do) and get a backtrace from that. It might also > be worth running the postmaster with connection tracing turned on (I > forget the incantation for that, but it should be in TFM). > > > At this time, I consider this to be a show-stopper on the release ... this > > is what happened the last time when the result appeared to be the index > > corruption > > If the postmaster is hanging then it's almost certainly unrelated to > index corruption... > > regards, tom lane > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Sun, 7 May 2000, The Hermit Hacker wrote: > > > Okay, just happened again ... no postgres backend is being started: I don't know how close in time it was, but I just hit reload on that query that was sent to webmaster. Vince. > > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND > pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres) > pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.16 -su (tcsh) > pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmaster -B 4096 -N 128 -S -o -F -o /pgsql/errout.5432 > pgsql 34677 0.0 0.2 1408 1048 p2 S 8:50PM 0:00.07 -su (tcsh) > pgsql 34685 0.0 0.2 1652 1032 p0 S+ 8:51PM 0:00.01 psql udmsearch > pgsql 34687 0.0 0.0 400 232 p2 R+ 8:51PM 0:00.00 ps ux > > Going to look at the connection tracing option now and see what I can come > up with ... > > > On Sun, 7 May 2000, Tom Lane wrote: > > > The Hermit Hacker <scrappy@hub.org> writes: > > > Okay, this is with code of ~May 4th ... a 'psql' connection to the > > > database hangs solid. > > > > Do you mean you can't make a connection at all? Is there any indication > > that the postmaster is lighting off a backend for you? Since you show > > a couple of zombie backends hanging around, it would seem like a good > > bet that the postmaster itself is wedged and not responding to events, > > but I'm not sure. > > > > > errout is dated: > > > > > pgsql% !ls > > > ls -lt > > > total 13324 > > > -rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432 > > > > > and the last few lines contain: > > > > > ERROR: parser: parse error at or near "vpti" > > > pq_recvbuf: unexpected EOF on client connection > > > pq_flush: send() failed: Broken pipe > > > pq_recvbuf: recv() failed: Connection reset by peer > > > pq_recvbuf: unexpected EOF on client connection > > > pq_recvbuf: unexpected EOF on client connection > > > pq_flush: send() failed: Broken pipe > > > pq_recvbuf: recv() failed: Connection reset by peer > > > > > But, of course, no date/time ... > > > > Given that the file mod time is considerably before the hang (right?) > > the messages in it are probably unrelated. It does seem odd that you > > have so many clients disconnecting ungracefully; what client apps are > > you running? > > > > > Since this is a production server, I can't just leave it there hung like > > > that, but if someone wants to give some instructions on what to do the > > > next time this happens, please feel free to do so, and I'll add that to my > > > list ... maybe run a gdb command on it, since truss doesn't appear to > > > help? > > > > Try killing the postmaster itself in such a way as to produce a coredump > > (kill -ABORT ought to do) and get a backtrace from that. It might also > > be worth running the postmaster with connection tracing turned on (I > > forget the incantation for that, but it should be in TFM). > > > > > At this time, I consider this to be a show-stopper on the release ... this > > > is what happened the last time when the result appeared to be the index > > > corruption > > > > If the postmaster is hanging then it's almost certainly unrelated to > > index corruption... > > > > regards, tom lane > > > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org > > -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net128K ISDN from $22.00/mo - 56K Dialup from $16.00/moat Pop4 Networking Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
kill -ABRT does nothing: pgsql% kill -ABRT 33683 pgsql% !ps ps ux USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres) pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.17 -su (tcsh) pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmas pgsql 34677 0.0 0.2 1408 1048 p2 S+ 8:50PM 0:00.08 -su (tcsh) pgsql 34696 0.0 0.0 396 232 p0 R+ 8:56PM 0:00.00 ps ux pgsql% !ps ps ux USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres) pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.17 -su (tcsh) pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmas pgsql 34677 0.0 0.2 1408 1048 p2 S+ 8:50PM 0:00.08 -su (tcsh) pgsql 34697 0.0 0.0 396 232 p0 R+ 8:56PM 0:00.00 ps ux On Sun, 7 May 2000, The Hermit Hacker wrote: > > > Okay, just happened again ... no postgres backend is being started: > > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND > pgsql 34611 0.0 0.0 0 0 ?? Z 8:43PM 0:00.00 (postgres) > pgsql 93757 0.0 0.2 1456 1104 p0 S Wed03PM 0:01.16 -su (tcsh) > pgsql 33683 0.0 0.6 38356 3024 ?? Is 7:38PM 0:03.54 /pgsql/bin/postmaster -B 4096 -N 128 -S -o -F -o /pgsql/errout.5432 > pgsql 34677 0.0 0.2 1408 1048 p2 S 8:50PM 0:00.07 -su (tcsh) > pgsql 34685 0.0 0.2 1652 1032 p0 S+ 8:51PM 0:00.01 psql udmsearch > pgsql 34687 0.0 0.0 400 232 p2 R+ 8:51PM 0:00.00 ps ux > > Going to look at the connection tracing option now and see what I can come > up with ... > > > On Sun, 7 May 2000, Tom Lane wrote: > > > The Hermit Hacker <scrappy@hub.org> writes: > > > Okay, this is with code of ~May 4th ... a 'psql' connection to the > > > database hangs solid. > > > > Do you mean you can't make a connection at all? Is there any indication > > that the postmaster is lighting off a backend for you? Since you show > > a couple of zombie backends hanging around, it would seem like a good > > bet that the postmaster itself is wedged and not responding to events, > > but I'm not sure. > > > > > errout is dated: > > > > > pgsql% !ls > > > ls -lt > > > total 13324 > > > -rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432 > > > > > and the last few lines contain: > > > > > ERROR: parser: parse error at or near "vpti" > > > pq_recvbuf: unexpected EOF on client connection > > > pq_flush: send() failed: Broken pipe > > > pq_recvbuf: recv() failed: Connection reset by peer > > > pq_recvbuf: unexpected EOF on client connection > > > pq_recvbuf: unexpected EOF on client connection > > > pq_flush: send() failed: Broken pipe > > > pq_recvbuf: recv() failed: Connection reset by peer > > > > > But, of course, no date/time ... > > > > Given that the file mod time is considerably before the hang (right?) > > the messages in it are probably unrelated. It does seem odd that you > > have so many clients disconnecting ungracefully; what client apps are > > you running? > > > > > Since this is a production server, I can't just leave it there hung like > > > that, but if someone wants to give some instructions on what to do the > > > next time this happens, please feel free to do so, and I'll add that to my > > > list ... maybe run a gdb command on it, since truss doesn't appear to > > > help? > > > > Try killing the postmaster itself in such a way as to produce a coredump > > (kill -ABORT ought to do) and get a backtrace from that. It might also > > be worth running the postmaster with connection tracing turned on (I > > forget the incantation for that, but it should be in TFM). > > > > > At this time, I consider this to be a show-stopper on the release ... this > > > is what happened the last time when the result appeared to be the index > > > corruption > > > > If the postmaster is hanging then it's almost certainly unrelated to > > index corruption... > > > > regards, tom lane > > > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
With -d set to 1 (connection tracing), all I see when I connect, in the log files, is: FindExec: found "/pgsql/bin/postgres" using argv[0] FindExec: found "/pgsql/bin/postgres" using argv[0] doesn't tell me to what I'm connecting through ... On Sun, 7 May 2000, The Hermit Hacker wrote: > > Okay, this is with code of ~May 4th ... a 'psql' connection to the > database hangs solid. > > errout is dated: > > pgsql% !ls > ls -lt > total 13324 > -rw------- 1 pgsql pgsql 4842715 May 7 10:57 errout.5432 > > and the last few lines contain: > > ERROR: parser: parse error at or near "vpti" > pq_recvbuf: unexpected EOF on client connection > pq_flush: send() failed: Broken pipe > pq_recvbuf: recv() failed: Connection reset by peer > pq_recvbuf: unexpected EOF on client connection > pq_recvbuf: unexpected EOF on client connection > pq_flush: send() failed: Broken pipe > pq_recvbuf: recv() failed: Connection reset by peer > > But, of course, no date/time ... > > ps shows: > > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND > pgsql 33515 0.0 0.0 0 0 ?? Z 4:45PM 0:00.00 (postgres) > pgsql 33516 0.0 0.0 0 0 ?? Z 4:45PM 0:00.00 (postgres) > pgsql 93757 0.0 0.2 1456 1088 p0 S Wed03PM 0:01.11 -su (tcsh) > pgsql 7100 0.0 0.5 38692 2616 ?? Is Fri12AM 8:43.44 /pgsql/bin/postmas > pgsql 33667 0.0 0.0 396 224 p0 R+ 7:35PM 0:00.00 ps ux > > and postmaster is started with: > > pgsql% cat pgstart > #!/bin/tcsh > setenv PORT 5432 > setenv POSTMASTER /pgsql/bin/postmaster > unlimit > ${POSTMASTER} -B 4096 -N 128 -S -o "-F -o /pgsql/errout.${PORT} -S 32768" \ > -i -p ${PORT} -D/pgsql/data > > The machine is a Dual PIII with 512Meg of RAM, running FreeBSD 4.0-STABLE > from April 22nd ... > > pgsql% truss -p 7100 > > Shows zilch ... > > Since this is a production server, I can't just leave it there hung like > that, but if someone wants to give some instructions on what to do the > next time this happens, please feel free to do so, and I'll add that to my > list ... maybe run a gdb command on it, since truss doesn't appear to > help? > > At this time, I consider this to be a show-stopper on the release ... this > is what happened the last time when the result appeared to be the index > corruption ... this time, I've checked a VACUUM after re-starting and it > doesn't appear to be a problem, but they might not have been related, just > a fluke ... > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker <scrappy@hub.org> writes: > kill -ABRT does nothing: Oh? Must be hung up in a kernel call then. That will probably mean that you can't attach to the stuck process with gdb either (though it'd be worth trying, since a backtrace would be mighty useful if you could get it). My next thought is to truss the postmaster process before it hangs up, with hopes of finding out what kernel call is hanging. Also, you might try netstat to see if you can see any freshly-opened incoming connections when it happens. Also, "lsof -p" or local equivalent on the stuck postmaster. regards, tom lane
>Try killing the postmaster itself in such a way as to produce a coredump >(kill -ABORT ought to do) and get a backtrace from that. The "gcore" command (on most modern unices) will generate a core dump of a running process without killing the process. It seems that would be more useful in this circumstance. -Michael Robinson
*sigh* > gcore 87721 gcore: /proc/87721/file: No such file or directory On Mon, 8 May 2000, Michael Robinson wrote: > >Try killing the postmaster itself in such a way as to produce a coredump > >(kill -ABORT ought to do) and get a backtrace from that. > > The "gcore" command (on most modern unices) will generate a core dump of a > running process without killing the process. It seems that would be more > useful in this circumstance. > > -Michael Robinson > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Are we still releasing 7.0 tomorrow? > > *sigh* > > > gcore 87721 > gcore: /proc/87721/file: No such file or directory > > > > On Mon, 8 May 2000, Michael Robinson wrote: > > > >Try killing the postmaster itself in such a way as to produce a coredump > > >(kill -ABORT ought to do) and get a backtrace from that. > > > > The "gcore" command (on most modern unices) will generate a core dump of a > > running process without killing the process. It seems that would be more > > useful in this circumstance. > > > > -Michael Robinson > > > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org > > -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Sun, 7 May 2000, Bruce Momjian wrote: > Are we still releasing 7.0 tomorrow? I don't know ... this problem has me nervous, but I can't seem to re-create it on the fly :( It happened twice so far today, and I'm working on improving logging to see if I can narrow it down ... I would like to *at least* postpone until Wednesday to see if I can recreate this between now and then ... will spend a good part of tomorrow seeing if I can get a more decent amount of data logged, to narrow her down ... We still have to write up a release announcement (can someone summarize the key features of v7.0?), so that gives us a little bit of time ... > > > > > *sigh* > > > > > gcore 87721 > > gcore: /proc/87721/file: No such file or directory > > > > > > > > On Mon, 8 May 2000, Michael Robinson wrote: > > > > > >Try killing the postmaster itself in such a way as to produce a coredump > > > >(kill -ABORT ought to do) and get a backtrace from that. > > > > > > The "gcore" command (on most modern unices) will generate a core dump of a > > > running process without killing the process. It seems that would be more > > > useful in this circumstance. > > > > > > -Michael Robinson > > > > > > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > > Systems Administrator @ hub.org > > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org > > > > > > > -- > Bruce Momjian | http://www.op.net/~candle > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> On Sun, 7 May 2000, Bruce Momjian wrote: > > > Are we still releasing 7.0 tomorrow? > > I don't know ... this problem has me nervous, but I can't seem to > re-create it on the fly :( It happened twice so far today, and I'm > working on improving logging to see if I can narrow it down ... > > I would like to *at least* postpone until Wednesday to see if I can > recreate this between now and then ... will spend a good part of tomorrow > seeing if I can get a more decent amount of data logged, to narrow her > down ... Isn't is something we can fix with a 7.0.1? Seems many people are already using 7.0 in production systems. I just hate to see the date slip again. > > We still have to write up a release announcement (can someone summarize > the key features of v7.0?), so that gives us a little bit of time ... Well, you can take it off the top of the HISTORY file. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Mon, 8 May 2000, Bruce Momjian wrote: > > On Sun, 7 May 2000, Bruce Momjian wrote: > > > > > Are we still releasing 7.0 tomorrow? > > > > I don't know ... this problem has me nervous, but I can't seem to > > re-create it on the fly :( It happened twice so far today, and I'm > > working on improving logging to see if I can narrow it down ... > > > > I would like to *at least* postpone until Wednesday to see if I can > > recreate this between now and then ... will spend a good part of tomorrow > > seeing if I can get a more decent amount of data logged, to narrow her > > down ... > > Isn't is something we can fix with a 7.0.1? Seems many people are > already using 7.0 in production systems. I just hate to see the date > slip again. As I said, if we feel comfortable with this, no probs ... its not an issue I'm going to push, since it is something that I'm finding relativley difficult to recreate "at will" :( > > We still have to write up a release announcement (can someone summarize > > the key features of v7.0?), so that gives us a little bit of time ... > > Well, you can take it off the top of the HISTORY file. Great, will work this up tomorrow during the day :) Thanks ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> > Isn't is something we can fix with a 7.0.1? Seems many people are > > already using 7.0 in production systems. I just hate to see the date > > slip again. > > As I said, if we feel comfortable with this, no probs ... its not an issue > I'm going to push, since it is something that I'm finding relativley > difficult to recreate "at will" :( > > > > We still have to write up a release announcement (can someone summarize > > > the key features of v7.0?), so that gives us a little bit of time ... > > > > Well, you can take it off the top of the HISTORY file. > > Great, will work this up tomorrow during the day :) Thanks ... > My feeling is that we can address this in 7.0.1, though our recent pg_group fix could not be done in 7.0.1, but this doesn't seem like that kind of problem. Such problems are usually easily reproducible because they represent problems with the system catalogs. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >>>> Are we still releasing 7.0 tomorrow? >> >> I don't know ... this problem has me nervous, but I can't seem to >> re-create it on the fly :( It happened twice so far today, and I'm >> working on improving logging to see if I can narrow it down ... >> >> I would like to *at least* postpone until Wednesday to see if I can >> recreate this between now and then ... will spend a good part of tomorrow >> seeing if I can get a more decent amount of data logged, to narrow her >> down ... > Isn't is something we can fix with a 7.0.1? Seems many people are > already using 7.0 in production systems. I just hate to see the date > slip again. That's my feeling too. Whatever this is, it seems to be in the postmaster not the backend. We've hardly changed the postmaster since 6.5.3, so I suspect the problem has existed for a good while and is of low probability. (I have no explanation why Marc's suddenly getting bit, but if it weren't low-probability we'd surely have more reports than just his, no?) Almost certainly, we will need a 7.0.1 in a few weeks, once 7.0 gets out there and starts getting pounded on by people outside the circle of usual suspects (sorry, been watching _Casablanca_ again). If we delay 7.0 until we can figure out what this bug is all about, we might be sitting on it for days or weeks. Let's push 7.0 out the door and let some other work go on in parallel while we try to figure out this one. Marc, if you see it happen again could you give me a call before you restart? I'd like to telnet in and poke at it a little myself. (Wait a sec, is this happening on hub, or somewhere else?) regards, tom lane
On Mon, 8 May 2000, The Hermit Hacker wrote: > > *sigh* > > > gcore 87721 > gcore: /proc/87721/file: No such file or directory Accroding to TFM: The process identifier, pid, must be given on the command line. If no executable image is specified, gcore will use ``/proc/<pid>/file''. So you might try: gcore /path_to_postmaster/postmaster 87721 or something close to that. Vince. > > > > On Mon, 8 May 2000, Michael Robinson wrote: > > > >Try killing the postmaster itself in such a way as to produce a coredump > > >(kill -ABORT ought to do) and get a backtrace from that. > > > > The "gcore" command (on most modern unices) will generate a core dump of a > > running process without killing the process. It seems that would be more > > useful in this circumstance. > > > > -Michael Robinson > > > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org > > -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net128K ISDN from $22.00/mo - 56K Dialup from $16.00/moat Pop4 Networking Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
On Mon, 8 May 2000, Tom Lane wrote: > Marc, if you see it happen again could you give me a call before you > restart? I'd like to telnet in and poke at it a little myself. > (Wait a sec, is this happening on hub, or somewhere else?) We built a Dual-PIII server to handle just database server, so I can give you access to it ...
Thus spake The Hermit Hacker > > Marc, if you see it happen again could you give me a call before you > > restart? I'd like to telnet in and poke at it a little myself. > > (Wait a sec, is this happening on hub, or somewhere else?) > > We built a Dual-PIII server to handle just database server, so I can give > you access to it ... Are you talking about the new database server for Trends? If so I should mention that I had to restart it this morning. Sorry, I didn't poke around in it before doing so. Clients couldn't log in and I couldn't wait. I should mention that I did have to kill -9 it. A simple kill didn't work. I then cleared out the lock file and restarted it and connections seem to be working again. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
On Mon, 8 May 2000, D'Arcy J.M. Cain wrote: > Thus spake The Hermit Hacker > > > Marc, if you see it happen again could you give me a call before you > > > restart? I'd like to telnet in and poke at it a little myself. > > > (Wait a sec, is this happening on hub, or somewhere else?) > > > > We built a Dual-PIII server to handle just database server, so I can give > > you access to it ... > > Are you talking about the new database server for Trends? If so I > should mention that I had to restart it this morning. Sorry, I didn't > poke around in it before doing so. Clients couldn't log in and I > couldn't wait. > > I should mention that I did have to kill -9 it. A simple kill didn't > work. I then cleared out the lock file and restarted it and > connections seem to be working again. That's the server ... and that's the key problem ... there are apps running on here that are such that delaying the restart, when it requires it, is very difficult :( D'Arcy, when it happens again, and if you catch it before me, can you run: gcore -s bin/postmaster <pid> on it as the pgsql user before restarting it? I just tested it here and it dump'd core nicely ... I'm hoping it does the same if/when the postmaster itself hangs *cross fingers* Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Thus spake The Hermit Hacker > D'Arcy, when it happens again, and if you catch it before me, can you run: > > gcore -s bin/postmaster <pid> > > on it as the pgsql user before restarting it? I just tested it here and > it dump'd core nicely ... I'm hoping it does the same if/when the > postmaster itself hangs *cross fingers* Will do. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
The Hermit Hacker <scrappy@hub.org> writes: > We still have to write up a release announcement (can someone summarize > the key features of v7.0?), so that gives us a little bit of time ... Man, there's a lot of stuff in the HISTORY file, isn't there? The list at the top isn't too bad: Foreign Keys Foreign keys are now implemented, with the exception of PARTIAL MATCH foreign keys. Many usershave been asking for this feature, and we are pleased to offer it. Optimizer Overhaul Continuingon work started a year ago, the optimizer has been overhauled, allowing improved query execution and better performance with less memory usage. Updated psql psql, our interactive terminal monitor,has been updated with a variety of new features. See the psql manual page for details. UpcomingFeatures In 7.1, we plan to have outer joins, storage for very long rows, and a write-ahead loggingsystem. Some other things that might be worth mentioning: Date/time datatypes cleaned up We have brought the date/time datatypes into compliance with the SQL standard, replacingthe old partially-implemented SQL date/time types with full-featured implementations. The default display formatfor date/time data has also changed to be ISO style. This may create a few compatibility issues for old applications. [Thomas may want to rewrite this item...] Query length limits removed There is no longer any fixed limit on the length of a query string. (The block-size limiton the length of a stored row still exists, but we hope to fix that in 7.1.) Removal of 8-argument limit on index keys and functions The maximum number of keys in an index or arguments to a functionis now configurable, with default limit of 16, rather than the old hard-coded limit of 8. Sorts and hashes now work for >2GB of data Temporary files can now be split in the same way that oversize relationsare, so that data volume is only limited by available disk space and not by OS limits on the size of an individualfile. It wouldn't be hard to make this list a *lot* longer, but... You should also make a point of the literally hundreds of smaller features, bug fixes, and performance improvements that are in this release. regards, tom lane
> Upcoming Features > In 7.1, we plan to have outer joins, storage for very long > rows, and a write-ahead logging system. Oh BTW, *are* we still planning outer joins for 7.1? I thought the plan was to push out the querytree redesign to 7.2, and try to have a fairly short release cycle for 7.1 instead, with TOAST and WAL as the centerpiece attractions. regards, tom lane
Query length limits removed There is no longer any fixed limit on the length of a query string. (The block-size limiton the length of a stored rowstill exists, but we hope to fix that in 7.1.) Is the row length limit 8k? If not, what is the row length limit? Thanks! - Mitch
"Mitch Vincent" <mitch@huntsvilleal.com> writes: > Is the row length limit 8k? If not, what is the row length limit? Well, it's BLCKSZ less some overhead --- BLCKSZ is 8K in a stock installation ... regards, tom lane
On Mon, 8 May 2000, Mitch Vincent wrote: > Query length limits removed > There is no longer any fixed limit on the length of a query > string. (The block-size limit on the length of a stored row > still exists, but we hope to fix that in 7.1.) > > > Is the row length limit 8k? If not, what is the row length limit? Right now, the tuple length is still at 8k ... Jan's TOAST implementation is designed to finally rid us of that as well ...
On Mon, 8 May 2000, Tom Lane wrote: > "Mitch Vincent" <mitch@huntsvilleal.com> writes: > > Is the row length limit 8k? If not, what is the row length limit? > > Well, it's BLCKSZ less some overhead --- BLCKSZ is 8K in a stock > installation ... A text datatype isn't limited to that too, is it? Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net128K ISDN from $22.00/mo - 56K Dialup from $16.00/moat Pop4 Networking Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
Thus spake Vince Vielhaber > On Mon, 8 May 2000, Tom Lane wrote: > > > "Mitch Vincent" <mitch@huntsvilleal.com> writes: > > > Is the row length limit 8k? If not, what is the row length limit? > > > > Well, it's BLCKSZ less some overhead --- BLCKSZ is 8K in a stock > > installation ... > > A text datatype isn't limited to that too, is it? It would kind of have to be, wouldn't it, if the row it had to fit in had that limit? -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
On Mon, 8 May 2000, D'Arcy J.M. Cain wrote: > Thus spake Vince Vielhaber > > On Mon, 8 May 2000, Tom Lane wrote: > > > > > "Mitch Vincent" <mitch@huntsvilleal.com> writes: > > > > Is the row length limit 8k? If not, what is the row length limit? > > > > > > Well, it's BLCKSZ less some overhead --- BLCKSZ is 8K in a stock > > > installation ... > > > > A text datatype isn't limited to that too, is it? > > It would kind of have to be, wouldn't it, if the row it had to fit in > had that limit? BLOBs aren't. Or did I miss something somewhere? I've always understood the text datatype to be simply a text version of a BLOB. Not necessarily in Postgres, but elsewhere. Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net128K ISDN from $22.00/mo - 56K Dialup from $16.00/moat Pop4 Networking Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
Mitch Vincent wrote: > > Query length limits removed > There is no longer any fixed limit on the length of a query > string. (The block-size limit on the length of a stored row > still exists, but we hope to fix that in 7.1.) > > > Is the row length limit 8k? If not, what is the row length limit? 8k by default, max is 32K if you recompile- > > Thanks! > > - Mitch
This brings me to another question. Hopefully there isn't a 8k (max 32k) limit on TEXT fields -- I'll assume there isn't a limit on TEXT fields for the purpose of this email.. What do you guys think of storing whole text files (normally stored in a flat file) in the database for searching purposes? Would a search on an indexed TEXT field be slow as mud? I'll try it on my home machine for kicks, just wanted to get some theoretical opinions... Thanks! - Mitch "The only real failure is quitting." > > Is the row length limit 8k? If not, what is the row length limit? > > 8k by default, max is 32K if you recompile-
Vince Vielhaber wrote: > > On Mon, 8 May 2000, D'Arcy J.M. Cain wrote: > > > It would kind of have to be, wouldn't it, if the row it had to fit in > > had that limit? > > BLOBs aren't. Or did I miss something somewhere? I've always understood > the text datatype to be simply a text version of a BLOB. Not yet in Postgres > Not necessarily in Postgres, but elsewhere. Maybe elsewere. In postgres it will be a new kind of (B)LOB, different from current LOs. Current LOs are again separate from TEXT even ODBC and JDBC use them for other BLOB support. --------------- Hannu
Mitch Vincent wrote: > > This brings me to another question. Hopefully there isn't a 8k (max 32k) > limit on TEXT fields -- No, they currently just have to fit in a record ;) They will be stored (optionally) separately in future (7.1) > > What do you guys think of storing whole text files (normally stored in a > flat file) in the database for searching purposes? Would a search on an > indexed TEXT field be slow as mud? Depends on search ;) like "a%" may not be too slow (unless the indexes on text field will be disallowed initially, as has been mentioned some times) PG does not yet have a native full-text index. There is a suboptimal implementation using triggers and extra tables in contrib. ---------- Hannu
Thus spake Vince Vielhaber > > > > Well, it's BLCKSZ less some overhead --- BLCKSZ is 8K in a stock > > > > installation ... > > > > > > A text datatype isn't limited to that too, is it? > > > > It would kind of have to be, wouldn't it, if the row it had to fit in > > had that limit? > > BLOBs aren't. Or did I miss something somewhere? I've always understood > the text datatype to be simply a text version of a BLOB. Not necessarily > in Postgres, but elsewhere. You mean text FILES, not datatype. There is a base type called text which has to fit in the row so it is naturally limited to the row size. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
> > Upcoming Features > > In 7.1, we plan to have outer joins, storage for very long > > rows, and a write-ahead logging system. > > Oh BTW, *are* we still planning outer joins for 7.1? I thought the plan > was to push out the querytree redesign to 7.2, and try to have a fairly > short release cycle for 7.1 instead, with TOAST and WAL as the > centerpiece attractions. Oops, you are right. At the time I wrote this, we were going to do a normal period 7.1. I have updated the HISTORY and release.sgml to say 7.1 _or_ 7.2. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Man, there's a lot of stuff in the HISTORY file, isn't there? > The list at the top isn't too bad: Ack! I didn't realize that there was a plain text HISTORY file, since it *should* come from the SGML sources. I had changed the wording, and eliminated the prediction for features in the next release (that should appear on the web site imho, not in the release docs). Check the release notes (INSTALL and release.htm) for the latest wording. Let me see if I can get the HISTORY file replaced with something fresh; however, it is not a show-stopper so if you've already done the build don't worry about it. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
> > Man, there's a lot of stuff in the HISTORY file, isn't there? > > The list at the top isn't too bad: > > Ack! I didn't realize that there was a plain text HISTORY file, since > it *should* come from the SGML sources. I had changed the wording, and > eliminated the prediction for features in the next release (that > should appear on the web site imho, not in the release docs). I have been changing both each time. History does not generate directly from SGML because it needs to be one big file with proper breaks between sections. I left the prediction in because this is not a big-feature release, and I wanted people to know what we were planning. This is the first release where we have definate plans for new features in the next release. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> I left the prediction in because this is not a big-feature release, and > I wanted people to know what we were planning. This is the first > release where we have definate plans for new features in the next > release. Seems most appropriate to put this info on the web site, where it is less formal and more easily changed/updated/removed. We already could be mentioning the TOAST work, etc etc as ongoing projects and outer joins are in that category too. Vince, is there a place where we could put this kind of stuff? Somewhere in the developer's lounge area? Perhaps a summary page of ongoing projects and then links to specific pages for each project where necessary? - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
On Tue, 9 May 2000, Thomas Lockhart wrote: > > I left the prediction in because this is not a big-feature release, and > > I wanted people to know what we were planning. This is the first > > release where we have definate plans for new features in the next > > release. > > Seems most appropriate to put this info on the web site, where it is > less formal and more easily changed/updated/removed. We already could > be mentioning the TOAST work, etc etc as ongoing projects and outer > joins are in that category too. > > Vince, is there a place where we could put this kind of stuff? > Somewhere in the developer's lounge area? Perhaps a summary page of > ongoing projects and then links to specific pages for each project > where necessary? Already is: http://www.Postgresql.org/projects/index.html Jan's been maintaining it. The projects on that page aren't necessarily planned for the next release tho (what's there now very well might be but that's not the intent of the page), so we may want to have a more specific list pointing there. BTW, I'm currently waiting on some graphics (already have some) to put the Developer's Corner and User's Lounge online. With some out of town travel coming up I may not be able to get it online till the beginning of June tho. Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net128K ISDN from $22.00/mo - 56K Dialup from $16.00/moat Pop4 Networking Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================