Re: Stalls on PGSemaphoreLock - Mailing list pgsql-performance

From Ray Stell
Subject Re: Stalls on PGSemaphoreLock
Date
Msg-id 295608A9-10CA-4F3C-B45D-A98EA8E8F8CF@gmail.com
Whole thread Raw
In response to Stalls on PGSemaphoreLock  (Matthew Spilich <mspilich@tripadvisor.com>)
Responses RE : Stalls on PGSemaphoreLock
List pgsql-performance

On Mar 25, 2014, at 8:46 AM, Matthew Spilich wrote:

The symptom:   The database machine (running postgres 9.1.9 on CentOS 6.4) is running a low utilization most of the time, but once every day or two, it will appear to slow down to the point where queries back up and clients are unable to connect.  Once this event occurs, there are lots of concurrent queries, I see slow queries appear in the logs, but there doesn't appear to be anything abnormal that I have been able to see that causes this behavior.
...
Has any on the forum seen something similar?   Any suggestions on what to look at next?    If it is helpful to describe the server hardware, it's got 2 E5-2670 cpu and 256 GB of ram, and the database is hosted on 1.6TB raid 10 local storage (15K 300 GB drives).  


I could be way off here, but years ago I experienced something like this (in oracle land) and after some stressful chasing, the marginal failure of the raid controller revealed itself.  Same kind of event, steady traffic and then some i/o would not complete and normal ops would stack up.  Anyway, what you report reminded me of that event.  The E5 is a few years old, I wonder if the raid controller firmware needs a patch?  I suppose a marginal power supply might cause a similar "hang."  Anyway, marginal failures are very painful.  Have you checked sar or OS logging at event time?

pgsql-performance by date:

Previous
From: Claudio Freire
Date:
Subject: Re: pg_dump vs pg_basebackup
Next
From: Matthew Spilich
Date:
Subject: Re: Stalls on PGSemaphoreLock