Thread: Postmaster won't -HUP
Hello out there, I'm having a problem with a production server. Actually, there are two problems. The semi-trivial problem is that Postgres won't die using the service mechanism. As root, I "service postgres stop" and then "service postgres start" after a reasonable wait. The restart will earn me a "StreamServerPort: cannot bind to port" which indicates tht the process never died. A ps ax confirms the persistance of postmaster. When I kill -9 the processes (postmaster, the /bin/sh -c postgres, and logger) they process will claim to start with "service postgres start" but it reports no PID and doesn't show up in ps ax. It is clearly not running at this point. The real problem, which caused all this debugging, is that twice so far, for no apparent reason, I have pegged the processors on the server. The machine has two 500mHz processors with 256 MB ram. I have a hard time believing that one small query can bring that machine to its knees, but it has, twice. The queries were run through a hard coded php front end (for testing purposes). Any insight on these two problems would be appreciated greatly. thanks, Jerry Lynde Jerry Lynde IT - Invocation/Evocation, Banishing, et al. "In E-commerce, the E stands for Evil." Due Diligence Inc. http://www.diligence.com Phone: (406) 728-0001 x232 Fax: (406) 728-0006
In version 7.0 postgres waits for all clients to close their connections before exiting. Before it just quit. Jerry Lynde wrote: > > Hello out there, > > I'm having a problem with a production server. Actually, there are two > problems. The semi-trivial problem is that Postgres won't die using the > service mechanism. As root, I "service postgres stop" and then "service > postgres start" after a reasonable wait. The restart will earn me a > "StreamServerPort: cannot bind to port" which indicates tht the process > never died. A ps ax confirms the persistance of postmaster. When I kill -9 > the processes (postmaster, the /bin/sh -c postgres, and logger) they > process will claim to start with "service postgres start" but it reports no > PID and doesn't show up in ps ax. It is clearly not running at this point. > > The real problem, which caused all this debugging, is that twice so far, > for no apparent reason, I have pegged the processors on the server. The > machine has two 500mHz processors with 256 MB ram. I have a hard time > believing that one small query can bring that machine to its knees, but it > has, twice. The queries were run through a hard coded php front end (for > testing purposes). Any insight on these two problems would be appreciated > greatly. > > thanks, > > Jerry Lynde > > Jerry Lynde > IT - Invocation/Evocation, Banishing, et al. > "In E-commerce, the E stands for Evil." > Due Diligence Inc. > http://www.diligence.com > Phone: (406) 728-0001 x232 > Fax: (406) 728-0006
Jerry Lynde wrote: > > I'm having a problem with a production server. Actually, there are two > problems. The semi-trivial problem is that Postgres won't die using the > service mechanism. As root, I "service postgres stop" and then "service > postgres start" after a reasonable wait. The restart will earn me a > "StreamServerPort: cannot bind to port" which indicates tht the process > never died. A ps ax confirms the persistance of postmaster. When I kill -9 > the processes (postmaster, the /bin/sh -c postgres, and logger) they > process will claim to start with "service postgres start" but it reports no > PID and doesn't show up in ps ax. It is clearly not running at this point. > The real problem, which caused all this debugging, is that twice so far, > for no apparent reason, I have pegged the processors on the server. The > machine has two 500mHz processors with 256 MB ram. I have a hard time > believing that one small query can bring that machine to its knees, but it > has, twice. The queries were run through a hard coded php front end (for > testing purposes). Any insight on these two problems would be appreciated > greatly. Hello Jerry, What version are you running? What does your server log show? Any other logs? Can you show us the small query, # of rows, etc? Regards, Ed Loehr
Jerry Lynde wrote: > > Hello out there, > > I'm having a problem with a production server. Actually, there are two > problems. The semi-trivial problem is that Postgres won't die using the > service mechanism. As root, I "service postgres stop" and then "service > postgres start" after a reasonable wait. The restart will earn me a > "StreamServerPort: cannot bind to port" which indicates tht the process > never died. A ps ax confirms the persistance of postmaster. When I kill -9 > the processes (postmaster, the /bin/sh -c postgres, and logger) they > process will claim to start with "service postgres start" but it reports no > PID and doesn't show up in ps ax. It is clearly not running at this point. Hmm, on debian it always dies, though it does takes a while sometimes. Also, try to avoid kill -9 because then it can't clean up shared memory, locks, temporary files, etc. Try kill -INT or -TERM. > The real problem, which caused all this debugging, is that twice so far, > for no apparent reason, I have pegged the processors on the server. The > machine has two 500mHz processors with 256 MB ram. I have a hard time > believing that one small query can bring that machine to its knees, but it > has, twice. The queries were run through a hard coded php front end (for > testing purposes). Any insight on these two problems would be appreciated > greatly. Heh. Yes, I also wonder why "select * from bigtable" where bigtable is a 400,000 row table crashes the machine also :). (This is pg_dump btw). Maybe you should post the query. Actually, there is one other situation I've killed the machine nearly. I have two big tables, "bigtable" and "largetable". Then do this query: select * from bigtable where largetable.a = 1; (ofcourse the actual query was much longer but this one demonstrates the problem). It actually does a join between those two tables even though only one is mentioned in the from part. Running explain over this told that this would take a *very* long time to complete. This is probably not a problem in newer versions though. What version are you using? -- Martijn van Oosterhout <kleptog@cupid.suninternet.com> http://cupid.suninternet.com/~kleptog/
Wow! Thanks for all the replies! The version I'm running is 7.0.0 It could certainly be that the connections are remaining open and the server isn't dying due to that. I'll play with it a little when I get the chance, probably this weekend (when clients aren't hitting the server.) Thanks for the tip on -INT and -TERM. As for the query I'm running, it was simply select * from bigtable (about 2-300k lines) where firstname=<obscure fname> and lastname=<obscure lname> and DOB=<the exact DOB for the above named individual> Essentially, the query was hard-coded to pull a specific record from the data and disply it on the page. While the dataset is medium to large (IMHO) it shouldn't peg two 500's. The problem happened the first time reliably when I ran the query. I was using phpdb for the php interface (I have stopped using that for the time being, because it maxes the system every time.) Jerry Lynde IT - Invocation/Evocation, Banishing, et al. "In E-commerce, the E stands for Evil." Due Diligence Inc. http://www.diligence.com Phone: (406) 728-0001 x232 Fax: (406) 728-0006
Jerry Lynde wrote: > > As for the query I'm running, it was simply select * from bigtable (about > 2-300k lines) where > firstname=<obscure fname> and > lastname=<obscure lname> and > DOB=<the exact DOB for the above named individual> What indices do you have on those fields? Regards, Ed Loehr
Ed Loehr <eloehr@austin.rr.com> writes: > Jerry Lynde wrote: >> >> As for the query I'm running, it was simply select * from bigtable (about >> 2-300k lines) where >> firstname=<obscure fname> and >> lastname=<obscure lname> and >> DOB=<the exact DOB for the above named individual> > What indices do you have on those fields? And is it using them --- ie, what does EXPLAIN say about the query? regards, tom lane
At 12:11 PM 6/1/00 -0500, Ed Loehr wrote: >Jerry Lynde wrote: > > > > As for the query I'm running, it was simply select * from > bigtable (about > > 2-300k lines) where > > firstname=<obscure > fname> and > > lastname=<obscure > lname> and > > DOB=<the exact DOB for > the above named individual> > >What indices do you have on those fields? > >Regards, >Ed Loehr They are all indexed, the DOB index is actually DOBYear DOBDay and DOBMonth and all 5 fields are indexed
At 01:21 PM 6/1/00 -0400, Tom Lane wrote: >Ed Loehr <eloehr@austin.rr.com> writes: > > Jerry Lynde wrote: > >> > >> As for the query I'm running, it was simply select * from bigtable (about > >> 2-300k lines) where > >> firstname=<obscure fname> and > >> lastname=<obscure lname> and > >> DOB=<the exact DOB for the above named individual> > > > What indices do you have on those fields? > >And is it using them --- ie, what does EXPLAIN say about the query? > > regards, tom lane It started indexing on DOBMonth until I did a VACUUM ANALYZE, after which it uses the lastname index
Jerry Lynde wrote: > > At 12:11 PM 6/1/00 -0500, Ed Loehr wrote: > >Jerry Lynde wrote: > > > > > > As for the query I'm running, it was simply select * from > > bigtable (about > > > 2-300k lines) where > > > firstname=<obscure > > fname> and > > > lastname=<obscure > > lname> and > > > DOB=<the exact DOB for > > the above named individual> > > > >What indices do you have on those fields? > > > >Regards, > >Ed Loehr > > They are all indexed, the DOB index is actually DOBYear DOBDay and > DOBMonth and all 5 fields are indexed Do you have 5 indexes or do you have an index that spans more than one field?
At 05:19 PM 6/1/00 -0400, Joseph Shraibman wrote: >Jerry Lynde wrote: > > > > At 12:11 PM 6/1/00 -0500, Ed Loehr wrote: > > >Jerry Lynde wrote: > > > > > > > > As for the query I'm running, it was simply select * from > > > bigtable (about > > > > 2-300k lines) where > > > > firstname=<obscure > > > fname> and > > > > lastname=<obscure > > > lname> and > > > > DOB=<the exact DOB for > > > the above named individual> > > > > > >What indices do you have on those fields? > > > > > >Regards, > > >Ed Loehr > > > > They are all indexed, the DOB index is actually DOBYear DOBDay and > > DOBMonth and all 5 fields are indexed > >Do you have 5 indexes or do you have an index that spans more than one >field? Sorry for being less than explicit. There are 5 separate indices, one per field. Jerry Lynde IT - Invocation/Evocation, Banishing, et al. "In E-commerce, the E stands for Evil." Due Diligence Inc. http://www.diligence.com Phone: (406) 728-0001 x232 Fax: (406) 728-0006
Jerry Lynde <jlynde@diligence.com> writes: >>>> They are all indexed, the DOB index is actually DOBYear DOBDay and >>>> DOBMonth and all 5 fields are indexed >> >> Do you have 5 indexes or do you have an index that spans more than one >> field? > Sorry for being less than explicit. There are 5 separate indices, one per > field. So your query is really something more like ... WHERE firstname = 'joe' AND lastname = 'blow' AND DOByear = 1999 AND DOBmonth = 1 AND DOBday = 1 ? The problem here is that only one index can be used in any individual scan. If I were the optimizer I'd probably figure that lastname is going to be the most selective of the five available choices, too. I'd suggest storing the DOB as *one* field of type 'date'. You can pull out the subparts for display with date_part() when you need to, but for searches you'll be a lot better off with WHERE DOB = '1999-01-01' regards, tom lane
At 05:58 PM 6/1/00 -0400, you wrote: Jerry Lynde <jlynde@diligence.com> writes: >>>> They are all indexed, the DOB index is actually DOBYear DOBDay and >>>> DOBMonth and all 5 fields are indexed >> >> Do you have 5 indexes or do you have an index that spans more than one >> field? > Sorry for being less than explicit. There are 5 separate indices, one per > field. So your query is really something more like ... WHERE firstname = 'joe' AND lastname = 'blow' AND DOByear = 1999 AND DOBmonth = 1 AND DOBday = 1 ? yes The problem here is that only one index can be used in any individual scan. If I were the optimizer I'd probably figure that lastname is going to be the most selective of the five available choices, too. and it did, and that's ok I'd suggest storing the DOB as *one* field of type 'date'. You can pull out the subparts for display with date_part() when you need to, but for searches you'll be a lot better off with WHERE DOB = '1999-01-01' regards, tom lane Thanks for the tip. I might indeed take that approach in the future, however that's not really the problem I'm trying to tackle right now. Indexing by Last Name is fine with me, currently. What's not working for me is the part where the dual pentium 500 machine with 256MB RAM goes into deep thought indefinitely for one simple hard-coded query. I used to think that the problem was due to the phpdb module that I was invoking, since the behavior exhibited itself consistently doing the aforementioned query with the phpdb module. Using nothing but straight php I have been able to make the query run smoothly. The reason I no longer believe the problem was tied to phpdb is that the behavior with the processors (all processor time devoted to user processes) happened when I was not making use of phpdb anymore. In fact I wasn't even making queries at the time, so it may not be tied to postgres at all, but I suspect it might, since the problem happens at random currently, but was consistent and predictable with the phpdb-driven postgres query.
Jerry Lynde <jlynde@diligence.com> writes: > Thanks for the tip. I might indeed take that approach in the future, > however that's not really the problem I'm trying to tackle right now. > Indexing by Last Name is fine with me, currently. What's not working for me > is the part where the dual pentium 500 machine with 256MB RAM goes into > deep thought indefinitely for one simple hard-coded query. Ah, sorry ... I've been seeing so many optimizer questions lately that I tend to zero right in on anything that looks like a misoptimization issue. I'm not aware of any reason that a query such as you describe would tend to hang up the machine. It would be useful to know what you see in "top" or some other monitoring program when the problem happens. Is there just one backend process sucking all the CPU time? More than one? Is the process(es) memory usage stable, or climbing? An even more useful bit of info is a stack trace from a backend that's suffering the problem: if you do a "kill -ABORT" on it you should get a coredump and be able to backtrace with gdb. (Note this will cause a database system restart, ie all the other backends will commit harakiri too, so I wouldn't advise doing it during normal usage of the system.) regards, tom lane