Re: Slow queries / commits, mis-configuration or hardware issues? - Mailing list pgsql-performance
From | Greg Smith |
---|---|
Subject | Re: Slow queries / commits, mis-configuration or hardware issues? |
Date | |
Msg-id | 4EC33BA5.2040603@2ndQuadrant.com Whole thread Raw |
In response to | Slow queries / commits, mis-configuration or hardware issues? (Cody Caughlan <toolbag@gmail.com>) |
Responses |
Re: Slow queries / commits, mis-configuration or hardware
issues?
|
List | pgsql-performance |
On 11/14/2011 01:16 PM, Cody Caughlan wrote: > We're starting to see some slow queries, especially COMMITs that are > happening more frequently. The slow queries are against seemingly > well-indexed tables. > Slow commits like: > > 2011-11-14 17:47:11 UTC pid:14366 (44/0-0) LOG: duration: 3062.784 ms > statement: COMMIT > 2011-11-14 17:47:11 UTC pid:14604 (48/0-0) LOG: duration: 2593.351 ms > statement: COMMIT > > These slow COMMITs are against tables that received a large number of > UPDATEs and are growing fairly rapidly. > Linux will fill its write cache with all of the writes coming out of each checkpoint. With a 16GB instance, I would expect that 5% * 16GB ~= 800MB of writes are batched up when your system is slow. You should be able to confirm that by looking at the "Dirty:" line in /proc/meminfo With 800MB queued up and I/O that is lucky to get 50MB/s, the sync calls at the end of each checkpoint are sometimes blocking for multiple seconds: > 2011-11-14 17:38:48 UTC pid:3965 (-0) LOG: checkpoint complete: wrote > 15121 buffers (3.3%); 0 transaction log file(s) added, 0 removed, 8 > recycled; write=270.101 s, sync=2.989 s, total=273.112 s; sync > files=60, longest=1.484 s, average=0.049 s > 2011-11-14 17:39:15 UTC pid:3965 (-0) LOG: checkpoint starting: time > 2011-11-14 17:43:49 UTC pid:3965 (-0) LOG: checkpoint complete: wrote > 16462 buffers (3.6%); 0 transaction log file(s) added, 0 removed, 9 > recycled; write=269.978 s, sync=4.106 s, total=274.117 s; sync > files=82, longest=2.943 s, average=0.050 s > When an individual sync call gets stuck for that long, clients can easily get stuck behind it too. There are a couple of techniques that might help: -Switch filesystems if you're running a slow one. ext3 has bad latency behavior here, XFS and ext4 are better. -Lower the dirty_* tunables like dirty_background_ratio or its bytes version. This will reduce average throughput, but can lower latency. -Spread checkpoints out more so that less average writes are happening. -Decrease shared_buffers so less data is getting pushed out at checkpoint time. -Reduce your reliability expectations and turn off synchronous_commit. Your server is sometimes showing multi-second latency issues with bonnie++ too; that suggests how this problem is not even specific to PostgreSQL. Linux is hard to tune for low latency under all circumstances; fighting latency down under a heavy update workload is hard to do even with good hardware to accelerate write performance. In an EC2 environment, it may not even be possible to do without making trade-offs like disabling synchronous writes. I can easily get transactions hung for 10 to 15 seconds on one of their servers if I try to make that problem bad, you're only seeing the middle range of latency issues so far. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
pgsql-performance by date: