RE: [HACKERS] Major bug, possible, with Solaris 7? - Mailing list pgsql-hackers
From | Daryl W. Dunbar |
---|---|
Subject | RE: [HACKERS] Major bug, possible, with Solaris 7? |
Date | |
Msg-id | 003901be5ced$baaff1d0$1445e59b@ddunbar.eni.net Whole thread Raw |
In response to | RE: [HACKERS] Major bug, possible, with Solaris 7? (The Hermit Hacker <scrappy@hub.org>) |
Responses |
RE: [HACKERS] Major bug, possible, with Solaris 7?
|
List | pgsql-hackers |
OK. I'm running 6.4.3beta (after patching the code to compile - patches attached). Now we wait to see if it breaks again... DwD > -----Original Message----- > From: The Hermit Hacker [mailto:scrappy@hub.org] > Sent: Friday, February 19, 1999 11:48 PM > To: Daryl W. Dunbar > Cc: pgsql-hackers@postgreSQL.org > Subject: RE: [HACKERS] Major bug, possible, with Solaris 7? > > > On Fri, 19 Feb 1999, Daryl W. Dunbar wrote: > > > At this point, I willing to try anything. I'm in > production (live > > site), but we have not announced the site. What that > means is that > > I have the weekend to debug/fix/decide what to do. I'll take > > whatever version you suggest and load it. > > Apologies for the delay...there is a copy of > postgresql-6.4.3beta.tar.gz > available in the test directory...try that and please > report back here... > > > > > > DwD > > > > > -----Original Message----- > > > From: The Hermit Hacker [mailto:scrappy@hub.org] > > > Sent: Friday, February 19, 1999 10:39 PM > > > To: Daryl W. Dunbar > > > Cc: pgsql-hackers@postgreSQL.org > > > Subject: RE: [HACKERS] Major bug, possible, with Solaris 7? > > > > > > > > > On Fri, 19 Feb 1999, Daryl W. Dunbar wrote: > > > > > > > Oh, sorry. 6.4.2 with a backend patch to prevent the > > > parent death > > > > in the event of MaxBackendID being reached. > > > > > > > > I know it is in semop() because I did a truss on the child > > > > processes. From a small sample, it looks like they > may all be > > > > trying to operate on the same semaphore. I'm > recompiling with > > > > the -g flag to gain more insight... > > > > > > I'm just curious, but is this being used production yet? > > > If not, would > > > you be willing to try out the current snapshot, which is > > > soon to become > > > 6.5-BETA? If this apparent bug still exists there, I > > > think its sufficient > > > a bug to prevent v6.5 coming out until this is fixed > > > > > then again, > > > something this reproducible will most likely hold up > > > v6.4.3 from being > > > released also, so if we are planning a v6.4.3 (I thought > > > we were), we'll > > > have to get this fixed in the 6.4 line also. > > > > > > Actually, with that in mind, I'm putting together a very > > > quick tar ball of > > > what v6.4.3 is looking like so far. this is *not* a > > > release, but I'd like > > > to see if this problem exists in the most current STABLE > > > tree or not...I > > > know there has been quite a few fixes put into it... > > > > > > Check in about a half hour or so, under the 'test' > directory of > > > ftp.postgresql.org .. should be there then... > > > > > > > > > > > -----Original Message----- > > > > > From: owner-pgsql-hackers@postgreSQL.org > > > > > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf > > > Of The Hermit > > > > > Hacker > > > > > Sent: Friday, February 19, 1999 12:46 PM > > > > > To: pgsql-hackers@postgreSQL.org > > > > > Cc: Daryl W. Dunbar > > > > > Subject: [HACKERS] Major bug, possible, with Solaris 7? > > > > > > > > > > > > > > > > > > > > Can someone please take a minute to look at this? > > > > > > > > > > I've gzip'd and moved his errorlog to > > > > > ftp.postgresql.org:/pub/debugging...one thing that > > > appears to be > > > > > lacking...what version of PostgreSQL are you using? > > > > > > > > > > Marc G. Fournier > > > > > Systems Administrator @ hub.org > > > > > primary: scrappy@hub.org secondary: > > > > > scrappy@{freebsd|postgresql}.org > > > > > > > > > > ---------- Forwarded message ---------- > > > > > Date: Thu, 18 Feb 1999 18:23:25 -0500 > > > > > From: Daryl W. Dunbar <daryl@www.com> > > > > > To: The Hermit Hacker <scrappy@hub.org> > > > > > Subject: RE: Interested? > > > > > > > > > > Thanks Marc, We exchanged an e-mail or two last > > > week, along with > > > > > Tatsuo Ishii and Tom Lane. You suggested I truss > the process. > > > > > > > > > > Anyway, periodically, the backends spiral out of > > > control with hung > > > > > up children until I hit MaxBackendID (which I > > > compiled in to be > > > > > 128). Initially, I was running out of semaphores on > > > Solaris 7 and > > > > > changed /etc/system to add these lines: > > > > > set shmsys:shminfo_shmmax=16777216 > > > > > set shmsys:shminfo_shmmin=1 > > > > > set shmsys:shminfo_shmmni=128 > > > > > set shmsys:shminfo_shmseg=51 > > > > > * > > > > > set semsys:seminfo_semmap=128 > > > > > set semsys:seminfo_semmni=128 > > > > > set semsys:seminfo_semmns=8192 > > > > > set semsys:seminfo_semmnu=8192 > > > > > set semsys:seminfo_semmsl=64 > > > > > set semsys:seminfo_semopm=32 > > > > > set semsys:seminfo_semume=32 > > > > > > > > > > I increased shared memory so I could start more > backends... > > > > > > > > > > OK, so now, everything is running fine and boom, the > > > > > backends start > > > > > to hang on semop, eventually reaching MaxBackendID > > > and refusing > > > > > connections. > > > > > Attached is a log file from a hang up today. Debug > > > is set to 3. > > > > > All times are PST. I have carved out a bunch of > > > normal operation > > > > > from the beginning (about 21,000 lines) and redundant > > > 'too many > > > > > backends' (about 1,000 lines, while I was eating lunch :) > > > > > signified > > > > > by {SNIP SNIP}. I pick the log back up with the > > > birth of pid 2828 > > > > > and left several 'normal' cycles in until... > > > > > > > > > > You can see that process 2840 is the first child to > > > hang. It was > > > > > started at 11:39:23 and did not die until sent a 15 by > > > > > the parent at > > > > > 14:12:16. All of the hung processes fall between > > > 2840 and 3454. > > > > > > > > > > Sorry the file is so big. Here are some 'keys' > you can use: > > > > > Startup is the first line (obviously). > > > > > You can find child startup by looking for [2840] (pid > > > in brackets) > > > > > You can find child exits by looking for '2480 exited' > > > > > You can find where I send the kill signal by looking for > > > > > 'pmdie 15' > > > > > > > > > > I think that's a good start. :) > > > > > > > > > > Don't hesitate to contact me if I can shed any more > > > > > light. I'm wide > > > > > open to ideas at the moment. I'm in EST, but tend to > > > work until > > > > > 10-11 at night, so e-mail anytime. > > > > > > > > > > Thanks, > > > > > > > > > > DwD > > > > > > > > > > > -----Original Message----- > > > > > > From: The Hermit Hacker [mailto:scrappy@hub.org] > > > > > > Sent: Thursday, February 18, 1999 5:36 PM > > > > > > To: Daryl W. Dunbar > > > > > > Subject: Re: Interested? > > > > > > > > > > > > > > > > > > > > > > > > Hi Daryl... > > > > > > > > > > > > I'm not the strongest at internal code, so may not > > > > > > be of any help > > > > > > at all. I just went through my -hackers email, > and can't > > > > > > seem to find > > > > > > anything from you in there. Can you tell me what your > > > > > > problem is, as well > > > > > > as version of PostgreSQL you are using, and we'll see > > > > > > what we can do? > > > > > > > > > > > > Marc > > > > > > > > > > > > On Thu, 18 Feb 1999, Daryl W. Dunbar wrote: > > > > > > > > > > > > > Marc, > > > > > > > > > > > > > > I know that you put considerable volunteer time into > > > > > > PostgreSQL. If > > > > > > > I am not too bold in asking, and you are comfortable > > > > > > with it, I am > > > > > > > prepared to compensate you for your time if you can > > > > > assist me in > > > > > > > tracking down this rather nasty bug I have been > > > > > > e-mailing Hackers > > > > > > > about. Please let me know if you are > interested and if > > > > > > so, at what > > > > > > > rate. > > > > > > > > > > > > > > We are in the process of launching a pretty exciting > > > > > site and a > > > > > > > database in a integral part of it. I really want to > > > > > > use PostgreSQL, > > > > > > > but can not take it into production on Solaris with > > > > > this problem > > > > > > > going on. I'm in the process of installing a > test site > > > > > > on Linux to > > > > > > > see if the problem exists there, but I expect it > > > is limited to > > > > > > > Solaris. > > > > > > > > > > > > > > I anxiously await your response. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > DwD > > > > > > > > > > > > > > -- > > > > > > > Daryl W. Dunbar > > > > > > > VP of Engineering/Chief Technology Officer > > > > > > > http://www.com, Where the Web Begins! > > > > > > > mailto:daryl@www.com > > > > > > > > > > > > > > > > > > > > > > > > > > Marc G. Fournier > > > > > > Systems Administrator @ hub.org > > > > > > primary: scrappy@hub.org secondary: > > > > > > scrappy@{freebsd|postgresql}.org > > > > > > > > > > > > > > > > > > > > > > > > > > Marc G. Fournier > > > Systems Administrator @ hub.org > > > primary: scrappy@hub.org secondary: > > > scrappy@{freebsd|postgresql}.org > > > > > > > Marc G. Fournier > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: > scrappy@{freebsd|postgresql}.org >
pgsql-hackers by date: