Thread: Sudden crazy high CPU usage

Sudden crazy high CPU usage

From

Niels Kristian Schjødt

Date:

31 March 2014, 10:26:01

I'm running postgresql 9.3 on a production server. An hour ago, out of the "blue", I ran into an issue I have never
encounteredbefore: my server started to use CPU as crazy. The server is a standard ubuntu 12.04 LTE installation
runningonly Postgres and Redis. 

The incident can be seen on the in numbers below:

https://s3-eu-west-1.amazonaws.com/autouncle-public/other/cpu.png

I imidiatly took a look at pg_stat_activity but nothing in there seemed suspicious. I also had a look at the postgres
log,but nothing was in there too. I have pg_stat_statements running, so I reseted that one, and nothing really
suspiciousoccurred in there, expect for the fact, that all queries were taking 100x times longer than usual. 

I have tried the following with no luck:

    • Restart clients connecting to the db
    • Restart postgres
    • Restart the whole server

I have run memory tests on the server as well, and nothing seems to be wrong.

No changes in any software running on the servers has been made within the last 24 hours.

The question is: I have a streaming replication server running, which I have now done a failover to, and it runs fine.
HoweverI still have no clue why my master suddenly has become so CPU consuming, and how I can debug / trace it further
down?

Re: Sudden crazy high CPU usage

From

Merlin Moncure

Date:

31 March 2014, 13:47:25

On Mon, Mar 31, 2014 at 5:25 AM, Niels Kristian Schjødt
<nielskristian@autouncle.com> wrote:
> I'm running postgresql 9.3 on a production server. An hour ago, out of the "blue", I ran into an issue I have never
encounteredbefore: my server started to use CPU as crazy. The server is a standard ubuntu 12.04 LTE installation
runningonly Postgres and Redis. 
>
> The incident can be seen on the in numbers below:
>
> https://s3-eu-west-1.amazonaws.com/autouncle-public/other/cpu.png
>
> I imidiatly took a look at pg_stat_activity but nothing in there seemed suspicious. I also had a look at the postgres
log,but nothing was in there too. I have pg_stat_statements running, so I reseted that one, and nothing really
suspiciousoccurred in there, expect for the fact, that all queries were taking 100x times longer than usual. 
>
> I have tried the following with no luck:
>
>         * Restart clients connecting to the db
>         * Restart postgres
>         * Restart the whole server
>
> I have run memory tests on the server as well, and nothing seems to be wrong.
>
> No changes in any software running on the servers has been made within the last 24 hours.
>
> The question is: I have a streaming replication server running, which I have now done a failover to, and it runs
fine.However I still have no clue why my master suddenly has become so CPU consuming, and how I can debug / trace it
furtherdown? 

Using linux 6? One possible culprit is "Transparent Huge Page
Compaction".  It tends to hit severs with a lot of memory, especially
if they've configured a lot of shared buffers.  Google it a for a lot
of info.

There may be other issues masquerading as this one but it's the first
thing to rule out.  Symptoms are very high cpu utilization and poor
performance that strikes without warning and then resolves also
without warning (typically seconds or minutes after the event).

For starters, take a look at the value of:

/sys/kernel/mm/redhat_transparent_hugepage/enabled

And do some due diligence research.

merlin

Re: Sudden crazy high CPU usage

From

Niels Kristian Schjødt

Date:

31 March 2014, 14:24:39

Thanks, this seems to persist after a reboot of the server though, and I have never in my server’s 3 months life time experienced anything like it.

Niels Kristian Schjødt

Co-founder & Developer

E-Mail: nielskristian@autouncle.com

Mobile: 0045 28 73 04 93

www.autouncle.com

Get app for: iPhone & iPad | Android

Den 31/03/2014 kl. 15.47 skrev Merlin Moncure <mmoncure@gmail.com>:

On Mon, Mar 31, 2014 at 5:25 AM, Niels Kristian Schjødt
<nielskristian@autouncle.com> wrote:
I'm running postgresql 9.3 on a production server. An hour ago, out of the "blue", I ran into an issue I have never encountered before: my server started to use CPU as crazy. The server is a standard ubuntu 12.04 LTE installation running only Postgres and Redis.

The incident can be seen on the in numbers below:

https://s3-eu-west-1.amazonaws.com/autouncle-public/other/cpu.png

I imidiatly took a look at pg_stat_activity but nothing in there seemed suspicious. I also had a look at the postgres log, but nothing was in there too. I have pg_stat_statements running, so I reseted that one, and nothing really suspicious occurred in there, expect for the fact, that all queries were taking 100x times longer than usual.

I have tried the following with no luck:

       * Restart clients connecting to the db
       * Restart postgres
       * Restart the whole server

I have run memory tests on the server as well, and nothing seems to be wrong.

No changes in any software running on the servers has been made within the last 24 hours.

The question is: I have a streaming replication server running, which I have now done a failover to, and it runs fine. However I still have no clue why my master suddenly has become so CPU consuming, and how I can debug / trace it further down?

Using linux 6? One possible culprit is "Transparent Huge Page
Compaction". It tends to hit severs with a lot of memory, especially
if they've configured a lot of shared buffers. Google it a for a lot
of info.

There may be other issues masquerading as this one but it's the first
thing to rule out. Symptoms are very high cpu utilization and poor
performance that strikes without warning and then resolves also
without warning (typically seconds or minutes after the event).

For starters, take a look at the value of:

/sys/kernel/mm/redhat_transparent_hugepage/enabled

And do some due diligence research.

merlin

Attachment

au_logo.png

Re: Sudden crazy high CPU usage

From

Merlin Moncure

Date:

31 March 2014, 14:37:02

On Mon, Mar 31, 2014 at 9:24 AM, Niels Kristian Schjødt <nielskristian@autouncle.com> wrote:

Thanks, this seems to persist after a reboot of the server though, and I have never in my server’s 3 months life time experienced anything like it.

huh. Any chance of getting 'perf' installed and running a perf top?

merlin

Re: Sudden crazy high CPU usage

From

Scott Marlowe

Date:

31 March 2014, 14:50:28

On Mon, Mar 31, 2014 at 8:24 AM, Niels Kristian Schjødt
<nielskristian@autouncle.com> wrote:
>
> Thanks, this seems to persist after a reboot of the server though, and I have never in my server's 3 months life time
experiencedanything like it. 

Could it be overheating and therefore throttling the cores?

Also another thing to look at on large memory machines with > 1 CPU
socket is zone_reclaim_mode being set to 1. Always set it to 0 on a
linux machine running postgres.

Re: Sudden crazy high CPU usage

From

Niels Kristian Schjødt

Date:

31 March 2014, 17:17:07

Yes, I could install “perf”, though I’m not familiar with it. What would i do? :-)

Niels Kristian Schjødt

Co-founder & Developer

E-Mail: nielskristian@autouncle.com

Mobile: 0045 28 73 04 93

www.autouncle.com

Get app for: iPhone & iPad | Android

Den 31/03/2014 kl. 16.36 skrev Merlin Moncure <mmoncure@gmail.com>:

perf

Attachment

au_logo.png

Re: Sudden crazy high CPU usage

From

Niels Kristian Schjødt

Date:

31 March 2014, 17:21:12

Thanks, I don’t think overheating is an issue, it’s a large dell server, and I have checked the historic CPU
temperaturein the servers control panel, and no overheating has shown. 

Zone_reclaim_mode is already set to 0

Den 31/03/2014 kl. 16.50 skrev Scott Marlowe <scott.marlowe@gmail.com>:

> On Mon, Mar 31, 2014 at 8:24 AM, Niels Kristian Schjødt
> <nielskristian@autouncle.com> wrote:
>>
>> Thanks, this seems to persist after a reboot of the server though, and I have never in my server's 3 months life
timeexperienced anything like it. 
>
> Could it be overheating and therefore throttling the cores?
>
> Also another thing to look at on large memory machines with > 1 CPU
> socket is zone_reclaim_mode being set to 1. Always set it to 0 on a
> linux machine running postgres.

Re: Sudden crazy high CPU usage

From

Sergey Konoplev

Date:

31 March 2014, 17:36:39

On Mon, Mar 31, 2014 at 3:25 AM, Niels Kristian Schjødt
<nielskristian@autouncle.com> wrote:
> I'm running postgresql 9.3 on a production server. An hour ago, out of the "blue", I ran into an issue I have never
encounteredbefore: my server started to use CPU as crazy. The server is a standard ubuntu 12.04 LTE installation
runningonly Postgres and Redis. 
>
> The incident can be seen on the in numbers below:
>
> https://s3-eu-west-1.amazonaws.com/autouncle-public/other/cpu.png

The increase doesn't look so sudden. My guess is that the server got
some new activity. The advice is to setup the statistics collecting
script by the link [1] and review the results for a period of hour or
so. It shows charts of statements by CPU/IO/calls with aggregated
stats, so you could probably find out more than with pure
pg_stat_statements.

[1] https://github.com/grayhemp/pgcookbook/blob/master/statement_statistics_collecting_and_reporting.md

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

Re: Sudden crazy high CPU usage

From

Will Platnick

Date:

31 March 2014, 20:02:08

In New Relic, go back a half hour before the problem started so you can't see that this spike happened and send the same screenshot in. My guess is you have increased activity hitting the DB. Do you have pgbouncer or some kind of connection pooling sitting in front? 198 open server connections could account for an increase in load like you're seeing. Do you have postgresql addon in New Relic to show you how many queries are hitting the system to correlate data to?

On Mon, Mar 31, 2014 at 1:36 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:

On Mon, Mar 31, 2014 at 3:25 AM, Niels Kristian Schjødt
<nielskristian@autouncle.com> wrote:
> I'm running postgresql 9.3 on a production server. An hour ago, out of the "blue", I ran into an issue I have never encountered before: my server started to use CPU as crazy. The server is a standard ubuntu 12.04 LTE installation running only Postgres and Redis.
>
> The incident can be seen on the in numbers below:
>
> https://s3-eu-west-1.amazonaws.com/autouncle-public/other/cpu.png

The increase doesn't look so sudden. My guess is that the server got
some new activity. The advice is to setup the statistics collecting
script by the link [1] and review the results for a period of hour or
so. It shows charts of statements by CPU/IO/calls with aggregated
stats, so you could probably find out more than with pure
pg_stat_statements.

[1] https://github.com/grayhemp/pgcookbook/blob/master/statement_statistics_collecting_and_reporting.md

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Sudden crazy high CPU usage

From

Niels Kristian Schjødt

Date:

01 April 2014, 21:51:08

Sorry, but nothing unusual here either, I have compared the time just before with the same time the days before and the throughput pattern is exactly the same. No differences.

Den 31/03/2014 kl. 22.01 skrev Will Platnick <wplatnick@gmail.com>:

In New Relic, go back a half hour before the problem started so you can't see that this spike happened and send the same screenshot in. My guess is you have increased activity hitting the DB. Do you have pgbouncer or some kind of connection pooling sitting in front? 198 open server connections could account for an increase in load like you're seeing. Do you have postgresql addon in New Relic to show you how many queries are hitting the system to correlate data to?

On Mon, Mar 31, 2014 at 1:36 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:
On Mon, Mar 31, 2014 at 3:25 AM, Niels Kristian Schjødt
<nielskristian@autouncle.com> wrote:
> I'm running postgresql 9.3 on a production server. An hour ago, out of the "blue", I ran into an issue I have never encountered before: my server started to use CPU as crazy. The server is a standard ubuntu 12.04 LTE installation running only Postgres and Redis.
>
> The incident can be seen on the in numbers below:
>
> https://s3-eu-west-1.amazonaws.com/autouncle-public/other/cpu.png

The increase doesn't look so sudden. My guess is that the server got
some new activity. The advice is to setup the statistics collecting
script by the link [1] and review the results for a period of hour or
so. It shows charts of statements by CPU/IO/calls with aggregated
stats, so you could probably find out more than with pure
pg_stat_statements.

[1] https://github.com/grayhemp/pgcookbook/blob/master/statement_statistics_collecting_and_reporting.md

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Sudden crazy high CPU usage

From

Andres Freund

Date:

01 April 2014, 22:17:00

On 2014-03-31 19:16:58 +0200, Niels Kristian Schjødt wrote:
> Yes, I could install “perf”, though I’m not familiar with it. What would i do? :-)

As root:
perf record -a sleep 5
perf report > my-nice-perf-report.txt

And then send the my-nice-perf-report.txt file.

Locally it's much nicer to see the output using "perf report" without
redirect into a file, you'll get an interactive UI.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Sudden crazy high CPU usage

From

Michael Paquier

Date:

03 April 2014, 00:00:15

On Wed, Apr 2, 2014 at 7:16 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-03-31 19:16:58 +0200, Niels Kristian Schjødt wrote:
>> Yes, I could install "perf", though I'm not familiar with it. What would i do? :-)
>
> As root:
> perf record -a sleep 5
> perf report > my-nice-perf-report.txt
>
> And then send the my-nice-perf-report.txt file.
>
> Locally it's much nicer to see the output using "perf report" without
> redirect into a file, you'll get an interactive UI.
The Postgres wiki has a page dedicated to perf as well here:
https://wiki.postgresql.org/wiki/Profiling_with_perf

Regards,
--
Michael