Thread: High SYS CPU - need advise
Hello everyone, I'm seeking help in diagnosing / figuring out the issue that we have with our DB server: Under some (relatively non-heavy) load: 300...400 TPS, every 10-30 seconds server drops into high cpu system usage (90%+ SYSTEM across all CPUs - it's pure SYS cpu, i.e. it's not io wait, not irq, not user). Postgresql is taking 10-15% at the same time. Those periods would last from few seconds, to minutes or until Postgresql is restarted. Needless to say that system is barely responsive, with load average hitting over 100. We have mostly select statements (joins across few tables), using indexes and resulting in a small number of records returned. Should number of requests per second coming drop a bit, server does not fall into those HIGH-SYS-CPU periods. It all seems like postgres runs out of some resources or fighting for some locks and that causing kernel to go into la-la land trying to manage it. So far we've checked: - disk and nic delays / errors / utilization - WAL files (created rarely) - tables are vacuumed OK. periods of high SYS not tied to vacuum process. - kernel resources utilization (sufficient FS handles, shared MEM/SEM, VM) - increased log level, but nothing suspicious/different (to me) is reported there during periods of high sys-cpu - ran pgbench (could not reproduce the issue, even though it was producing over 40,000 TPS for prolonged period of time) Basically, our symptoms are exactly as was reported here over a year ago (though for postgres 8.3, we ran 9.1): http://archives.postgresql.org/pgsql-general/2011-10/msg00998.php I will be grateful for any ideas helping to resolve or diagnose this problem. Environment background: ----- -- View this message in context: http://postgresql.1045698.n5.nabble.com/High-SYS-CPU-need-advise-tp5734597.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Sun, Dec 2, 2012 at 9:08 AM, rahul143 <rk204885@gmail.com> wrote: > Hello everyone, > > I'm seeking help in diagnosing / figuring out the issue that we have with > our DB server: > > Under some (relatively non-heavy) load: 300...400 TPS, every 10-30 seconds > server drops into high cpu system usage (90%+ SYSTEM across all CPUs - it's > pure SYS cpu, i.e. it's not io wait, not irq, not user). Postgresql is > taking 10-15% at the same time. Those periods would last from few seconds, > to minutes or until Postgresql is restarted. Needless to say that system is > barely responsive, with load average hitting over 100. We have mostly select > statements (joins across few tables), using indexes and resulting in a small > number of records returned. Should number of requests per second coming drop > a bit, server does not fall into those HIGH-SYS-CPU periods. It all seems > like postgres runs out of some resources or fighting for some locks and that > causing kernel to go into la-la land trying to manage it. > > > So far we've checked: > - disk and nic delays / errors / utilization > - WAL files (created rarely) > - tables are vacuumed OK. periods of high SYS not tied to vacuum process. > - kernel resources utilization (sufficient FS handles, shared MEM/SEM, VM) > - increased log level, but nothing suspicious/different (to me) is reported > there during periods of high sys-cpu > - ran pgbench (could not reproduce the issue, even though it was producing > over 40,000 TPS for prolonged period of time) > > Basically, our symptoms are exactly as was reported here over a year ago > (though for postgres 8.3, we ran 9.1): > http://archives.postgresql.org/pgsql-general/2011-10/msg00998.php > > I will be grateful for any ideas helping to resolve or diagnose this > problem. Didn't we just discuss this exact problem on the identically named thread? http://postgresql.1045698.n5.nabble.com/High-SYS-CPU-need-advise-td5732045.html If you're the same poster, it's good to reference the thread and any conclusions made in order to save everyone's time. As at happens, I have been working an angle that may help solve this problem. Are you willing/able to run patched postgres and what's your tolerance for risk? merlin
Merlin Moncure escribió: > Didn't we just discuss this exact problem on the identically named > thread? http://postgresql.1045698.n5.nabble.com/High-SYS-CPU-need-advise-td5732045.html Ignore this guy. It's a bot reinjecting old messages, or something like that, probably because of some bug in mail list scrubbing software. My impression is that it's eventually going to publish every email on a blog somewhere, or something like that. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services