Thread: Killing "stuck" queries and preventing queries from getting "stuck"
I have a situation where there are dozens of daemons hitting the same postgres database. They all run different types of queries but each daemon runs the same set of queries over and over again. Sometimes some queries get "stuck" in that they run for hours and hours. They never stop running. Killing the deamon does not stop the query from running. Once there are three of four of these "stuck" queries the database slows down drastically. Is there a way to tell postgres to stop any query that runs longer than a specified amount of time? Say an hour? Failing that what is a good strategy for detecting stuck queries and killing them. Thanks.
Tim Uckun <timuckun@gmail.com> writes: > Is there a way to tell postgres to stop any query that runs longer > than a specified amount of time? Say an hour? Setting statement_timeout would do that. You ought to figure out what's causing the performance problem, though, instead of just zapping things ... regards, tom lane
On Tue, Sep 28, 2010 at 3:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Tim Uckun <timuckun@gmail.com> writes: >> Is there a way to tell postgres to stop any query that runs longer >> than a specified amount of time? Say an hour? > > Setting statement_timeout would do that. You ought to figure out > what's causing the performance problem, though, instead of just > zapping things ... Well the query is pretty heavy but it gets run a lot. There is a distinct in there which seems to be the cause of most of the headaches but it's going to take a while to redo the application to not use distinct. The query gets run a lot and 99.99% of the time it runs succesfully and the daemon goes on it's merry way. Occasionally it seems to "get stuck" and killing the daemon does not unstick it. I have not been willing to kill -9 the process and at this stage I can afford to restart the postgres. Eventually I won't be able to do that though so I want to fix the app so it uses a more reasonable query and detect and stop stuck queries in case other queries sneak into the process during development.
On 28/09/10 11:25, Tim Uckun wrote: > On Tue, Sep 28, 2010 at 3:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Tim Uckun <timuckun@gmail.com> writes: >>> Is there a way to tell postgres to stop any query that runs longer >>> than a specified amount of time? Say an hour? >> >> Setting statement_timeout would do that. You ought to figure out >> what's causing the performance problem, though, instead of just >> zapping things ... > > Well the query is pretty heavy but it gets run a lot. There is a > distinct in there which seems to be the cause of most of the headaches > but it's going to take a while to redo the application to not use > distinct. > > The query gets run a lot and 99.99% of the time it runs succesfully > and the daemon goes on it's merry way. Occasionally it seems to "get > stuck" and killing the daemon does not unstick it. Useful things to try when you have a "stuck" backend: - attach strace to it and see if it's doing anything that involves system calls - attach gdb to it and get a backtrace to see what it's up to. If it's using CPU, do this multiple times to see if it's in some kind of infinite loop, as you'll get a snapshot of different stacks if so. See: http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD - (on linux; you didn't mention your OS): cat /proc/$pid/stack , where $pid is the process id of the stuck backend, to see what the backend process is up to in the kernel. ... then post the output of all those tests here, along with the contents of "select * from pg_stat_activity", "select * from pg_locks" and anything from the postgresql log files that looks possibly relevant. -- Craig Ringer Tech-related writing: http://soapyfrogs.blogspot.com/
On 28 Sep 2010, at 1:41, Tim Uckun wrote: > Sometimes some queries get "stuck" in that they run for hours and > hours. They never stop running. Killing the deamon does not stop the > query from running. You really should try to find out why they get "stuck". Killing stuck clients isn't going to solve your problem (aside fromthe fact that you probably shouldn't be using kill -9 on them, that's like using a jackhammer on a jammed door). Some things to look into: Are those queries waiting on a lock by another daemon maybe? Are some of them keeping transactionsopen for a long time without committing them (or rolling them back)? I recall you were having another problem (with deleting records). This all smells like you either are waiting for locks onrecords or that the statistics used for query planning aren't reflecting the actual situation. Have a look in pg_locks and check the query plans of some of your more problematic queries (see: explain analyse) to seewhat's going on there. Posting the results of those here would allow more eyes to look into your issues, in case it isn'tobvious to you. There's some good documentation on these subjects too. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4ca22c9f678304378921584!
> > You really should try to find out why they get "stuck". Killing stuck clients isn't going to solve your problem (asidefrom the fact that you probably shouldn't be using kill -9 on them, that's like using a jackhammer on a jammed door). Well I didn't use kill -9 I used the pg_cancel_backend command. > > Some things to look into: Are those queries waiting on a lock by another daemon maybe? Are some of them keeping transactionsopen for a long time without committing them (or rolling them back)? I'll take a look at that. It certainly would be simpler than attaching a gdb session to the pid and getting a stacktrace. > I recall you were having another problem (with deleting records). This all smells like you either are waiting for lockson records or that the statistics used for query planning aren't reflecting the actual situation. > I am having some performance issues with the database. I am also trying to clean out a lot of records out of the system. Once all the records I want to delete are gone perhaps the problem will go away. I am also looking at how the application can be refactored not to use this particular DISTINCT query. > Have a look in pg_locks and check the query plans of some of your more problematic queries (see: explain analyse) to seewhat's going on there. Posting the results of those here would allow more eyes to look into your issues, in case it isn'tobvious to you. There's some good documentation on these subjects too. I did look at the analyze and basically postgres is saying the distinct is killing me. I remove that and the query is fine. I didn't look at the locks because the queries are read only so I didn't think they would be effected by locks but I will look at them post them here.