Re: Proposal of tunable fix for scalability of 8.4 - Mailing list pgsql-performance
From | Jignesh K. Shah |
---|---|
Subject | Re: Proposal of tunable fix for scalability of 8.4 |
Date | |
Msg-id | 49BABBCE.8090501@sun.com Whole thread Raw |
In response to | Re: Proposal of tunable fix for scalability of 8.4 ("Jignesh K. Shah" <J.K.Shah@Sun.COM>) |
Responses |
Re: Proposal of tunable fix for scalability of 8.4
|
List | pgsql-performance |
Redid the test with - waking up all waiters irrespective of shared, exclusive 480: 64: Medium Throughput: 66688.000 Avg Medium Resp: 0.005 540: 72: Medium Throughput: 74355.000 Avg Medium Resp: 0.005 600: 80: Medium Throughput: 82920.000 Avg Medium Resp: 0.005 660: 88: Medium Throughput: 91466.000 Avg Medium Resp: 0.005 720: 96: Medium Throughput: 98749.000 Avg Medium Resp: 0.006 780: 104: Medium Throughput: 107365.000 Avg Medium Resp: 0.006 840: 112: Medium Throughput: 114121.000 Avg Medium Resp: 0.006 900: 120: Medium Throughput: 119556.000 Avg Medium Resp: 0.006 960: 128: Medium Throughput: 128544.000 Avg Medium Resp: 0.006 1020: 136: Medium Throughput: 134725.000 Avg Medium Resp: 0.007 1080: 144: Medium Throughput: 138817.000 Avg Medium Resp: 0.007 1140: 152: Medium Throughput: 141482.000 Avg Medium Resp: 0.008 1200: 160: Medium Throughput: 149430.000 Avg Medium Resp: 0.008 1260: 168: Medium Throughput: 145104.000 Avg Medium Resp: 0.009 1320: 176: Medium Throughput: 143059.000 Avg Medium Resp: 0.011 1380: 184: Medium Throughput: 147687.000 Avg Medium Resp: 0.011 light: customer: No result set for custid 0 1440: 192: Medium Throughput: 148081.000 Avg Medium Resp: 0.013 light: customer: No result set for custid 0 1500: 200: Medium Throughput: 145452.000 Avg Medium Resp: 0.014 1560: 208: Medium Throughput: 146057.000 Avg Medium Resp: 0.015 1620: 216: Medium Throughput: 148456.000 Avg Medium Resp: 0.016 1680: 224: Medium Throughput: 153088.000 Avg Medium Resp: 0.016 1740: 232: Medium Throughput: 151263.000 Avg Medium Resp: 0.017 1800: 240: Medium Throughput: 154146.000 Avg Medium Resp: 0.017 1860: 248: Medium Throughput: 155520.000 Avg Medium Resp: 0.018 1920: 256: Medium Throughput: 154696.000 Avg Medium Resp: 0.019 1980: 264: Medium Throughput: 155391.000 Avg Medium Resp: 0.020 light: customer: No result set for custid 0 2040: 272: Medium Throughput: 156086.000 Avg Medium Resp: 0.021 2100: 280: Medium Throughput: 150085.000 Avg Medium Resp: 0.023 2160: 288: Medium Throughput: 152253.000 Avg Medium Resp: 0.024 2220: 296: Medium Throughput: 155203.000 Avg Medium Resp: 0.025 2280: 304: Medium Throughput: 157962.000 Avg Medium Resp: 0.025 light: customer: No result set for custid 0 2340: 312: Medium Throughput: 157270.000 Avg Medium Resp: 0.026 2400: 320: Medium Throughput: 161298.000 Avg Medium Resp: 0.027 2460: 328: Medium Throughput: 161527.000 Avg Medium Resp: 0.028 2520: 336: Medium Throughput: 163569.000 Avg Medium Resp: 0.028 2580: 344: Medium Throughput: 166190.000 Avg Medium Resp: 0.028 2640: 352: Medium Throughput: 168516.000 Avg Medium Resp: 0.029 2700: 360: Medium Throughput: 171417.000 Avg Medium Resp: 0.029 2760: 368: Medium Throughput: 173350.000 Avg Medium Resp: 0.029 2820: 376: Medium Throughput: 155672.000 Avg Medium Resp: 0.035 2880: 384: Medium Throughput: 172821.000 Avg Medium Resp: 0.031 2940: 392: Medium Throughput: 171819.000 Avg Medium Resp: 0.033 3000: 400: Medium Throughput: 171388.000 Avg Medium Resp: 0.033 3060: 408: Medium Throughput: 172949.000 Avg Medium Resp: 0.034 3120: 416: Medium Throughput: 172638.000 Avg Medium Resp: 0.036 3180: 424: Medium Throughput: 172310.000 Avg Medium Resp: 0.036 (My timed test made it end here..) vmstat seems similar to wakeup some kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr s0 s1 s2 sd in sy cs us sy id 63 0 0 45535728 38689856 0 14 0 0 0 0 0 0 0 0 0 163318 334225 360179 47 17 36 85 0 0 45436736 38690760 0 6 0 0 0 0 0 0 0 0 0 165536 347462 365987 47 17 36 59 0 0 45405184 38681752 0 11 0 0 0 0 0 0 0 0 0 155153 326182 345527 47 16 37 53 0 0 45393816 38673344 0 6 0 0 0 0 0 0 0 0 0 152752 317851 340737 47 16 37 66 0 0 45378312 38651920 0 11 0 0 0 0 0 0 0 0 0 150979 304350 336915 47 16 38 67 0 0 45489520 38639664 0 5 0 0 0 0 0 0 0 0 0 157188 318958 351905 47 16 37 82 0 0 45483600 38633344 0 10 0 0 0 0 0 0 0 0 0 168797 348619 375827 47 17 36 68 0 0 45463008 38614432 0 9 0 0 0 0 0 0 0 0 0 173020 376594 385370 47 18 35 54 0 0 45451376 38603792 0 13 0 0 0 0 0 0 0 0 0 161891 342522 364286 48 17 35 41 0 0 45356544 38605976 0 5 0 0 0 0 0 0 0 0 0 167250 358320 372469 47 17 36 27 0 0 45323472 38596952 0 11 0 0 0 0 0 0 0 0 0 165099 344695 364256 48 17 35 missed taking mpstat also dtrace shows that "Waiting" for procarray is not the most expensive wait. -bash-3.2# ./84_lwlock.d 17071 Lock Id Mode State Count CLogControlLock Shared Waiting 4 CLogControlLock Exclusive Waiting 32 ProcArrayLock Shared Waiting 35 CLogControlLock Shared Acquired 47 WALInsertLock Exclusive Waiting 53 ProcArrayLock Exclusive Waiting 104 XidGenLock Exclusive Acquired 116 ProcArrayLock Exclusive Acquired 117 CLogControlLock Exclusive Acquired 176 WALInsertLock Exclusive Acquired 370 FirstLockMgrLock Shared Acquired 793 FirstBufMappingLock Shared Acquired 799 ProcArrayLock Shared Acquired 882 SInvalReadLock Shared Acquired 1827 Lock Id Mode State Combined Time (ns) WALInsertLock Acquired 52915 CLogControlLock Acquired 78332 XidGenLock Acquired 103026 FirstLockMgrLock Acquired 392836 FirstBufMappingLock Acquired 2919896 CLogControlLock Shared Waiting 5342211 CLogControlLock Exclusive Waiting 9172692 ProcArrayLock Shared Waiting 18186546 ProcArrayLock Acquired 22478607 XidGenLock Exclusive Acquired 26561444 SInvalReadLock Acquired 29012891 CLogControlLock Exclusive Acquired 30490159 WALInsertLock Exclusive Waiting 35055294 FirstLockMgrLock Exclusive Acquired 47077668 FirstBufMappingLock Exclusive Acquired 47460381 WALInsertLock Exclusive Acquired 99288648 ProcArrayLock Exclusive Waiting 104221100 ProcArrayLock Exclusive Acquired 356644807 SInvalReadLock Exclusive Acquired 357530794 So clearly even waking up some more exclusives than just 1 seems to help scalability improve (though actual improvement mileage varies but there is some positive improvement). One more change that I can think of doing is a minor change where we wake all sequential shared waiters but only 1 exclusive waiter.. I am going to change that to ... whatever sequential you get wake them all up.. so in essense it does a similar heart valve type approach of doing little bursts rather than tie them to 1 exclusive only. -Jignesh Jignesh K. Shah wrote: > > > Now with a modified Fix (not the original one that I proposed but > something that works like a heart valve : Opens and shuts to minimum > default way thus controlling how many waiters are waked up ) > > Time:Users:throughput: Reponse > 60: 8: Medium Throughput: 7774.000 Avg Medium Resp: 0.004 > 120: 16: Medium Throughput: 16874.000 Avg Medium Resp: 0.004 > 180: 24: Medium Throughput: 25159.000 Avg Medium Resp: 0.004 > 240: 32: Medium Throughput: 33216.000 Avg Medium Resp: 0.005 > 300: 40: Medium Throughput: 42418.000 Avg Medium Resp: 0.005 > 360: 48: Medium Throughput: 49655.000 Avg Medium Resp: 0.005 > 420: 56: Medium Throughput: 58149.000 Avg Medium Resp: 0.005 > 480: 64: Medium Throughput: 66558.000 Avg Medium Resp: 0.005 > 540: 72: Medium Throughput: 74474.000 Avg Medium Resp: 0.005 > 600: 80: Medium Throughput: 82241.000 Avg Medium Resp: 0.005 > 660: 88: Medium Throughput: 90336.000 Avg Medium Resp: 0.005 > 720: 96: Medium Throughput: 99101.000 Avg Medium Resp: 0.006 > 780: 104: Medium Throughput: 106028.000 Avg Medium Resp: 0.006 > 840: 112: Medium Throughput: 113196.000 Avg Medium Resp: 0.006 > 900: 120: Medium Throughput: 119174.000 Avg Medium Resp: 0.006 > 960: 128: Medium Throughput: 129408.000 Avg Medium Resp: 0.006 > 1020: 136: Medium Throughput: 134433.000 Avg Medium Resp: 0.007 > 1080: 144: Medium Throughput: 143121.000 Avg Medium Resp: 0.007 > 1140: 152: Medium Throughput: 144603.000 Avg Medium Resp: 0.007 > 1200: 160: Medium Throughput: 148604.000 Avg Medium Resp: 0.008 > 1260: 168: Medium Throughput: 150274.000 Avg Medium Resp: 0.009 > 1320: 176: Medium Throughput: 150581.000 Avg Medium Resp: 0.010 > 1380: 184: Medium Throughput: 146912.000 Avg Medium Resp: 0.012 > 1440: 192: Medium Throughput: 143945.000 Avg Medium Resp: 0.013 > 1500: 200: Medium Throughput: 144029.000 Avg Medium Resp: 0.015 > 1560: 208: Medium Throughput: 143468.000 Avg Medium Resp: 0.016 > 1620: 216: Medium Throughput: 144367.000 Avg Medium Resp: 0.017 > 1680: 224: Medium Throughput: 148340.000 Avg Medium Resp: 0.017 > 1740: 232: Medium Throughput: 148842.000 Avg Medium Resp: 0.018 > 1800: 240: Medium Throughput: 149533.000 Avg Medium Resp: 0.019 > 1860: 248: Medium Throughput: 152334.000 Avg Medium Resp: 0.019 > 1920: 256: Medium Throughput: 151521.000 Avg Medium Resp: 0.020 > 1980: 264: Medium Throughput: 148961.000 Avg Medium Resp: 0.022 > 2040: 272: Medium Throughput: 151270.000 Avg Medium Resp: 0.022 > 2100: 280: Medium Throughput: 149783.000 Avg Medium Resp: 0.024 > 2160: 288: Medium Throughput: 151743.000 Avg Medium Resp: 0.024 > 2220: 296: Medium Throughput: 155190.000 Avg Medium Resp: 0.026 > 2280: 304: Medium Throughput: 150955.000 Avg Medium Resp: 0.027 > 2340: 312: Medium Throughput: 147118.000 Avg Medium Resp: 0.029 > 2400: 320: Medium Throughput: 152768.000 Avg Medium Resp: 0.029 > 2460: 328: Medium Throughput: 161044.000 Avg Medium Resp: 0.028 > 2520: 336: Medium Throughput: 157926.000 Avg Medium Resp: 0.029 > 2580: 344: Medium Throughput: 161005.000 Avg Medium Resp: 0.029 > 2640: 352: Medium Throughput: 167274.000 Avg Medium Resp: 0.029 > 2700: 360: Medium Throughput: 168253.000 Avg Medium Resp: 0.031 > > > With final vmstats improving but still far from 100% > kthr memory page disk faults cpu > r b w swap free re mf pi po fr de sr s0 s1 s2 sd in sy cs > us sy id > 38 0 0 46052840 39345096 0 11 0 0 0 0 0 0 0 0 0 134137 290703 > 303518 40 14 45 > 43 0 0 45656456 38882912 23 77 0 0 0 0 0 0 0 0 0 135820 272899 > 300749 40 15 45 > 38 0 0 45650488 38816984 23 80 0 0 0 0 0 0 0 0 0 135009 272767 > 300192 39 15 46 > 47 0 0 46020792 39187688 0 5 0 0 0 0 0 0 0 0 0 140473 285445 > 312826 40 14 46 > 24 0 0 46143984 39326848 9 61 0 0 0 0 0 0 0 0 0 146194 308590 > 328241 40 15 45 > 37 0 0 45465256 38757000 22 74 0 0 0 0 0 0 0 0 0 136835 293971 > 301433 38 14 48 > 35 0 0 46017544 39308072 12 61 0 0 0 0 0 0 0 0 0 142749 312355 > 320592 42 15 43 > 36 0 0 45456000 38744688 11 24 0 0 0 0 0 0 0 0 0 143566 303461 > 317683 41 15 43 > 23 0 0 46007408 39291312 2 22 0 0 0 0 0 0 0 0 0 140246 300061 > 316663 42 15 43 > 20 0 0 46029656 39281704 10 25 0 0 0 0 0 0 0 0 0 147787 291825 > 326387 43 15 42 > 24 0 0 46131016 39288528 2 21 0 0 0 0 0 0 0 0 0 150796 310697 > 335791 43 15 42 > 20 0 0 46109448 39269392 16 67 0 0 0 0 0 0 0 0 0 150075 315517 > 332881 43 16 41 > 30 0 0 45540928 38710376 9 27 0 0 0 0 0 0 0 0 0 155214 316448 > 341472 43 16 40 > 14 0 0 45987496 39270016 0 5 0 0 0 0 0 0 0 0 0 155028 333711 > 344207 44 16 40 > 25 0 0 45981136 39263008 0 10 0 0 0 0 0 0 0 0 0 153968 327343 > 343776 45 16 39 > 54 0 0 46062984 39259936 0 7 0 0 0 0 0 0 0 0 0 153721 315839 > 344732 45 16 39 > 42 0 0 46099704 39252920 0 15 0 0 0 0 0 0 0 0 0 154629 323125 > 348798 45 16 39 > 54 0 0 46068944 39230808 0 8 0 0 0 0 0 0 0 0 0 157166 340265 > 354135 46 17 37 > > But the real winner shows up in lockstat where it seems to indicate > that stress on Waiting from ProcArrayLock is relieved (thought > shifting somewhere else which is how lock works): > > # ./84_lwlock.d 8042 > > Lock Id Mode State Count > WALWriteLock Exclusive Acquired 1 > XidGenLock Exclusive Waiting 3 > CLogControlLock Shared Waiting 11 > ProcArrayLock Shared Waiting 39 > CLogControlLock Exclusive Waiting 52 > WALInsertLock Exclusive Waiting 73 > CLogControlLock Shared Acquired 91 > ProcArrayLock Exclusive Acquired 96 > XidGenLock Exclusive Acquired 96 > ProcArrayLock Exclusive Waiting 121 > CLogControlLock Exclusive Acquired 199 > WALInsertLock Exclusive Acquired 310 > FirstBufMappingLock Shared Acquired 408 > FirstLockMgrLock Shared Acquired 618 > ProcArrayLock Shared Acquired 746 > SInvalReadLock Shared Acquired 1542 > > Lock Id Mode State Combined Time > (ns) > WALInsertLock Acquired > 118673 > CLogControlLock Acquired > 172130 > FirstBufMappingLock Acquired > 177196 > WALWriteLock Exclusive Acquired > 208403 > XidGenLock Exclusive Waiting > 325989 > FirstLockMgrLock Acquired > 2667351 > ProcArrayLock Acquired > 8179335 > XidGenLock Exclusive Acquired > 8896177 > CLogControlLock Shared Waiting > 9680401 > CLogControlLock Exclusive Waiting > 19105179 > CLogControlLock Exclusive Acquired > 27484249 > SInvalReadLock Acquired > 43026960 > FirstBufMappingLock Exclusive Acquired > 45232906 > ProcArrayLock Shared Waiting > 46741660 > WALInsertLock Exclusive Waiting > 50912148 > FirstLockMgrLock Exclusive Acquired > 58789829 > WALInsertLock Exclusive Acquired > 86653791 > ProcArrayLock Exclusive Waiting > 213980787 > ProcArrayLock Exclusive Acquired > 270028367 > SInvalReadLock Exclusive Acquired > 303044735 > > > > > SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl > usr sys wt idl sze > 0 1 0 147238 159453 8806 370676 89236 71258 98435 0 380008 > 47 17 0 35 64 > 0 6 0 132463 143446 7975 331685 80847 64746 86578 0 329315 > 44 16 0 41 64 > 0 16 0 146655 158621 8987 366866 90756 69953 93786 0 349346 > 49 17 0 34 64 > 0 18 0 151326 163492 8992 377634 92860 72406 98968 4 365121 > 49 17 0 33 64 > 0 2 0 142914 154169 8243 352104 81385 69598 91260 0 340887 > 42 16 0 42 64 > 0 16 0 156755 168962 9080 386475 93072 74775 101465 0 379250 > 47 18 0 36 64 > 0 1 0 152807 165134 8880 379521 90671 75073 99692 0 380412 > 48 18 0 35 64 > 0 1 0 134778 146041 8122 339137 79888 66633 89220 0 342600 > 43 16 0 41 64 > 0 16 0 153014 164789 8834 376117 93000 72743 97644 0 371792 > 48 18 0 35 64 > > > Not sure what SInvalReadLock does.. need to read up on that.. > > > -Jignesh > >> >> 1200: 160: Medium Throughput: 130487.000 Avg Medium Resp: 0.011 >> 1260: 168: Medium Throughput: 123368.000 Avg Medium Resp: 0.013 >> 1320: 176: Medium Throughput: 134649.000 Avg Medium Resp: 0.012 >> 1380: 184: Medium Throughput: 136272.000 Avg Medium Resp: 0.013 >> >> >> kthr memory page disk faults >> cpu >> r b w swap free re mf pi po fr de sr s0 s1 s2 sd in sy cs >> us sy id >> 3 0 0 44710008 39603320 0 135 0 0 0 0 0 0 0 0 0 110564 145678 >> 211567 33 10 57 >> 5 0 0 44663368 39595008 0 6 0 0 0 0 0 0 0 0 0 108891 143083 >> 208389 33 10 58 >> 3 0 0 44753496 39593824 0 132 0 0 0 0 0 0 0 0 0 109922 126865 >> 209869 33 9 57 >> 4 0 0 44788368 39588528 0 7 0 0 0 0 0 0 0 0 0 108680 129073 >> 208068 33 10 57 >> 2 0 0 44767920 39570592 0 147 0 0 0 0 0 0 0 0 0 106671 142403 >> 204724 33 10 58 >> 4 0 0 44762656 39563256 0 11 0 0 0 0 0 0 0 0 0 106185 130328 >> 204551 34 10 57 >> 2 0 0 44674584 39560912 0 148 0 0 0 0 0 0 0 0 0 104757 139147 >> 201448 32 10 58 >> 1 0 0 44619824 39551024 0 9 0 0 0 0 0 0 0 0 0 103653 142125 >> 199896 32 10 58 >> 2 0 0 44622480 39552432 0 141 0 0 0 0 0 0 0 0 0 101373 134547 >> 195553 32 9 58 >> 1 0 0 44739936 39552312 0 11 0 0 0 0 0 0 0 0 0 102932 121742 >> 198205 33 9 58 >> >> >> And lock stats are as follows at about 280 users sampling for a >> single backend process: >> # ./84_lwlock.d 29405 >> >> Lock Id Mode State Count >> WALWriteLock Exclusive Acquired 1 >> XidGenLock Exclusive Waiting 1 >> CLogControlLock Shared Waiting 3 >> ProcArrayLock Shared Waiting 7 >> CLogControlLock Exclusive Waiting 9 >> WALInsertLock Exclusive Waiting 45 >> CLogControlLock Shared Acquired 52 >> ProcArrayLock Exclusive Waiting 61 >> XidGenLock Exclusive Acquired 96 >> ProcArrayLock Exclusive Acquired 97 >> CLogControlLock Exclusive Acquired 152 >> WALInsertLock Exclusive Acquired 302 >> ProcArrayLock Shared Acquired 729 >> FirstLockMgrLock Shared Acquired 812 >> FirstBufMappingLock Shared Acquired 857 >> SInvalReadLock Shared Acquired 1551 >> >> Lock Id Mode State Combined Time >> (ns) >> WALInsertLock Acquired >> 89909 >> XidGenLock Exclusive Waiting >> 101488 >> WALWriteLock Exclusive Acquired >> 140563 >> CLogControlLock Shared Waiting >> 354756 >> FirstBufMappingLock Acquired >> 471438 >> FirstLockMgrLock Acquired >> 2907141 >> XidGenLock Exclusive Acquired >> 7450934 >> CLogControlLock Exclusive Waiting >> 11094716 >> ProcArrayLock Acquired >> 15495229 >> WALInsertLock Exclusive Waiting >> 20801169 >> CLogControlLock Exclusive Acquired >> 21339264 >> SInvalReadLock Acquired >> 24309991 >> FirstLockMgrLock Exclusive Acquired >> 39904071 >> FirstBufMappingLock Exclusive Acquired >> 40826435 >> ProcArrayLock Shared Waiting >> 86352947 >> WALInsertLock Exclusive Acquired >> 89336432 >> SInvalReadLock Exclusive Acquired >> 252574515 >> ProcArrayLock Exclusive Acquired >> 315064347 >> ProcArrayLock Exclusive Waiting >> 847806215 >> >> mpstat outputs is too much so I am doing aggegation by procesor set >> which is all 64 cpus >> >> -bash-3.2$ mpstat -a 10 >> >> SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl >> usr sys wt idl sze >> 0 370 0 118649 127575 7595 244456 43931 62166 8700 0 158929 >> 38 11 0 50 64 >> 0 167 0 119668 128704 7644 246389 43287 62357 8816 0 161006 >> 38 11 0 51 64 >> 0 27 0 109461 117433 6997 224514 38562 56446 8171 0 148322 >> 34 10 0 56 64 >> 0 2 0 122368 131549 7871 250237 39620 61478 9082 0 165995 >> 36 11 0 52 64 >> 0 0 0 122025 131380 7973 249429 37292 59863 8922 0 166319 >> 35 11 0 54 64 >> >> (quick overview of columns ) >> SET Processor set >> minf minor faults >> mjf major faults >> xcal inter-processor cross-calls >> intr interrupts >> ithr interrupts as threads (not counting clock >> interrupt) >> csw context switches >> icsw involuntary context switches >> migr thread migrations (to another processor) >> smtx spins on mutexes (lock not acquired on first >> try) >> srw spins on readers/writer locks (lock not >> acquired on first try) >> syscl system calls >> usr percent user time >> sys percent system time >> wt the I/O wait time is no longer calculated as a >> percentage of CPU time, and this statistic >> will always return zero. >> idl percent idle time >> sze number of processors in the requested proces- >> sor set >> >> >> -Jignesh >> >> > -- Jignesh Shah http://blogs.sun.com/jkshah The New Sun Microsystems,Inc http://sun.com/postgresql
pgsql-performance by date: