Re: High CPU Utilization - Mailing list pgsql-performance
From | Joe Uhl |
---|---|
Subject | Re: High CPU Utilization |
Date | |
Msg-id | F161EF4F-4B72-47AC-A894-C1C1BE7B7E1A@gmail.com Whole thread Raw |
In response to | Re: High CPU Utilization (Greg Smith <gsmith@gregsmith.com>) |
Responses |
Re: High CPU Utilization
Re: High CPU Utilization |
List | pgsql-performance |
I dropped the pool sizes and brought things back up. Things are stable, site is fast, CPU utilization is still high. Probably just a matter of time before issue comes back (we get slammed as kids get out of school in the US). Now when I run vmtstat 1 30 it looks very different (below). Waiting is minimal, user is very high. Under nontrivial load, according to xact_commit in pg_stat_database we are doing 1800+ tps. Appreciate the input and explanation on vmstat. I am going to throw some of these numbers into zabbix so I can keep a better eye on them. This server is a couple years old so the purchase of a new controller and/or disks is not out of the question. On final note, have made several changes to postgresql.conf. Some of those here: max_connections = 1000 shared_buffers = 7680MB work_mem = 30MB synchronous_commit = off checkpoint_segments = 50 effective_cache_size = 20000MB procs -----------memory---------- ---swap-- -----io---- -system-- ---- cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 9 8 73036 500164 82200 23497748 3 4 669 541 1 1 23 3 54 19 20 4 73036 497252 82200 23500836 0 0 2500 680 11145 15168 91 4 2 2 21 1 73036 491416 82204 23503832 0 0 1916 920 10303 14032 94 4 1 1 23 5 73036 489580 82212 23505860 0 0 1348 3296 11682 15970 94 5 1 0 31 1 73036 481408 82220 23507752 0 0 984 8988 10123 11289 97 3 0 0 25 4 73036 483248 82232 23509420 0 0 1268 1312 10705 14063 96 4 0 0 23 4 73036 480096 82232 23512380 0 0 2372 472 9805 13996 94 5 1 1 24 4 73036 476732 82236 23515196 0 0 2012 720 10365 14307 96 3 1 0 22 1 73036 474468 82236 23516584 0 0 944 3108 9838 12831 95 4 1 0 14 1 73036 455756 82284 23534548 0 0 908 3284 9096 11333 94 4 1 0 10 2 73036 455224 82292 23536304 0 0 1760 416 12454 17736 89 6 3 2 17 0 73036 460620 82292 23538888 0 0 1292 968 12030 18333 90 7 2 1 13 4 73036 459764 82292 23539724 0 0 332 288 9722 14197 92 5 2 1 17 5 73036 457516 82292 23542176 0 0 1872 17752 10458 15465 91 5 2 1 19 4 73036 450804 82300 23545640 0 0 2980 640 10602 15621 90 6 2 2 24 0 73036 447660 82312 23547644 0 0 1736 10724 12401 15413 93 6 1 0 20 6 73036 444380 82320 23550692 0 0 2064 476 9008 10985 94 4 1 0 22 2 73036 442880 82328 23553640 0 0 2496 3156 10739 15211 93 5 1 1 11 1 73036 441448 82328 23555632 0 0 1452 3552 10812 15337 93 5 2 1 6 2 73036 439812 82348 23557420 0 0 1052 1128 8603 10514 91 3 3 2 6 3 73036 433456 82348 23560860 0 0 2484 656 7636 13033 68 4 14 14 6 3 73036 433084 82348 23562628 0 0 1400 408 6046 11778 70 3 18 9 5 0 73036 430776 82356 23564264 0 0 1108 1300 7549 13754 73 4 16 7 5 2 73036 430124 82360 23565580 0 0 1016 2216 7844 14507 72 4 18 6 4 2 73036 429652 82380 23567480 0 0 1168 2468 7694 15466 58 4 24 14 6 2 73036 427304 82384 23569668 0 0 1132 752 5993 13606 49 5 36 10 7 1 73036 423020 82384 23571932 0 0 1244 824 8085 18072 56 3 30 10 procs -----------memory---------- ---swap-- -----io---- -system-- ---- cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 4 0 73036 420816 82392 23573824 0 0 1292 820 5370 10958 46 2 41 10 9 1 73020 418048 82392 23576900 52 0 1632 2592 5931 11629 60 3 29 8 4 2 73004 415164 82424 23578620 56 0 1812 4116 7503 14674 71 3 15 12 On Mar 16, 2009, at 4:19 PM, Dave Youatt wrote: > Last column "wa" is % cpu time spent waiting (for IO to complete). > 80s > and 90s is pretty high, probably too high. > > Might also want to measure the IO/s performance of your RAID > controller. From the descriptions, it will be much more important > that > long sequential reads/writes for characterizing your workload. > > There are also some disappointing HW RAID controllers out there. > Generally, Aretec and Promise are good, Adaptec good, depending on > model, and the ones that Dell ship w/their servers haven't had good > reviews/reports. > > > On 03/16/2009 01:10 PM, Joe Uhl wrote: >> Here is vmstat 1 30. We are under peak load right now so I can >> gather >> information from the real deal :) >> >> Had an almost complete lockup a moment ago, number of non-idle >> postgres connections was 637. Going to drop our JDBC pool sizes a >> bit >> and bounce everything. >> >> procs -----------memory---------- ---swap-- -----io---- -system-- >> ----cpu---- >> r b swpd free buff cache si so bi bo in cs us >> sy >> id wa >> 12 35 95056 11102380 56856 14954948 3 4 669 541 1 2 >> 23 3 54 19 >> 12 39 95056 11092484 56876 14963204 0 0 6740 1204 10066 >> 13277 91 5 0 4 >> 8 42 95056 11081712 56888 14972244 0 0 8620 1168 10659 >> 17020 >> 78 6 0 15 >> 10 30 95052 11069768 56904 14982628 0 0 8944 976 9809 >> 15109 >> 81 6 1 12 >> 4 27 95048 11059576 56916 14991296 0 0 8852 440 7652 13294 >> 63 4 2 32 >> 5 42 95048 11052524 56932 14996496 0 0 4700 384 6383 11249 >> 64 4 4 28 >> 5 33 95048 11047492 56956 15001428 0 0 3852 572 6029 14010 >> 36 4 5 56 >> 7 35 95048 11041184 56960 15005480 0 0 3964 136 5042 10802 >> 40 3 1 56 >> 1 33 95048 11037988 56968 15009240 0 0 3892 168 3384 6479 >> 26 1 3 69 >> 3 28 95048 11029332 56980 15015744 0 0 6724 152 4964 12844 >> 11 2 8 79 >> 0 34 95048 11025880 56988 15020168 0 0 3852 160 3616 8614 >> 11 1 6 82 >> 3 25 95048 10996356 57044 15044796 0 0 7892 456 3126 7115 >> 4 3 8 85 >> 1 26 95048 10991692 57052 15050100 0 0 5188 176 2566 5976 >> 3 2 12 83 >> 0 29 95048 10985408 57060 15054968 0 0 4200 80 2586 6582 >> 4 1 12 83 >> 1 29 95048 10980828 57064 15058992 0 0 4560 64 2966 7557 >> 7 2 6 85 >> 2 28 95048 10977192 57072 15063176 0 0 3860 72 2695 6742 >> 11 1 7 81 >> 2 29 95048 10969120 57088 15067808 0 0 5084 84 3296 8067 >> 14 1 0 84 >> 0 25 95048 10962096 57104 15072984 0 0 4440 500 2721 6263 >> 12 1 6 80 >> 0 23 95044 10955320 57108 15079260 0 0 5712 232 2678 5990 >> 6 1 6 87 >> 2 25 95044 10948644 57120 15084524 0 0 5120 184 3499 8143 >> 20 3 9 69 >> 3 21 95044 10939744 57128 15090644 0 0 5756 264 4724 10272 >> 32 3 5 60 >> 1 19 95040 10933196 57144 15095024 12 0 4440 180 2585 5244 >> 13 2 15 70 >> 0 21 95040 10927596 57148 15098684 0 0 3248 136 2973 7292 >> 8 1 9 81 >> 1 20 95040 10920708 57164 15104244 0 0 5192 360 1865 4547 >> 3 1 9 87 >> 1 24 95040 10914552 57172 15105856 0 0 2308 16 1948 4450 >> 6 1 1 93 >> 0 24 95036 10909148 57176 15110240 0 0 3824 152 1330 2632 >> 3 1 6 90 >> 1 21 95036 10900628 57192 15116332 0 0 5680 180 1898 3986 >> 4 1 11 84 >> 0 19 95036 10888356 57200 15121736 0 0 5952 120 2252 3991 >> 12 1 8 79 >> 2 22 95036 10874336 57204 15128252 0 0 6320 112 2831 6755 >> 5 2 8 85 >> 3 26 95036 10857592 57220 15134020 0 0 5124 216 3067 5296 >> 32 6 3 59 >> >> Alan, my apologies if you get this twice. Didn't reply back to the >> list on first try. >> >> On Mar 16, 2009, at 3:52 PM, Alan Hodgson wrote: >> >>> On Monday 16 March 2009, Joe Uhl <joeuhl@gmail.com> wrote: >>>> Right now (not under peak load) this server is running at 68% CPU >>>> utilization and its SATA raid 10 is doing about 2MB/s writes and >>>> 11MB/ >>>> s reads. When I run dd I can hit 200+MB/s writes and 230+ MB/s >>>> reads, >>>> so we are barely using the available IO. Further when I run dd the >>>> CPU utilization of that process only approaches 20%-30% of one >>>> core. >>> >>> What does vmstat say when it's slow? The output of "vmstat 1 30" >>> would be >>> informative. >>> >>> note: dd is sequential I/O. Normal database usage is random I/O. >>> >>> -- >>> Even a sixth-grader can figure out that you can’t borrow money to >>> pay >>> off >>> your debt >>> >>> -- >>> Sent via pgsql-performance mailing list >>> (pgsql-performance@postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-performance >> >> > On Mar 16, 2009, at 4:35 PM, Greg Smith wrote: > On Mon, 16 Mar 2009, Joe Uhl wrote: > >> Here is vmstat 1 30. We are under peak load right now so I can >> gather information from the real deal > > Quite helpful, reformatting a bit and picking an informative section: > > procs -----------memory---------- ---swap- ----io--- -system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us > sy id wa > 0 34 95048 11025880 56988 15020168 0 0 3852 160 3616 8614 > 11 1 6 82 > 3 25 95048 10996356 57044 15044796 0 0 7892 456 3126 7115 > 4 3 8 85 > 1 26 95048 10991692 57052 15050100 0 0 5188 176 2566 5976 > 3 2 12 83 > > This says that your server is spending all its time waiting for I/O, > actual CPU utilization is minimal. You're only achieving around > 3-8MB/s of random I/O. That's the reality of what your disk I/O > subsystem is capable of, regardless of what its sequential > performance with dd looks like. If you were to run a more > complicated benchmark like bonnie++ instead, I'd bet that your > "seeks/second" results are very low, even though sequential read/ > write is fine. > > The Perc5 controllers have a pretty bad reputation for performance > on this list, even in RAID10. Not much you can do about that beyond > scrapping the controller and getting a better one. > > What you might do in order to reduce the total number of writes > needed is some standard postgresql.conf tuning; see http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server > > What you could do here is increase shared_buffers, > checkpoint_segments, and checkpoint_completion_target as described > there. Having more buffers dedicated to the database and having > less checkpoints can result in less random I/O showing up, as > popular data pages will stay in RAM for longer without getting > written out so much. > > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com > Baltimore, MD
pgsql-performance by date: