Proposal of tunable fix for scalability of 8.4 - Mailing list pgsql-performance
From | Jignesh K. Shah |
---|---|
Subject | Proposal of tunable fix for scalability of 8.4 |
Date | |
Msg-id | 49B824DD.7090302@sun.com Whole thread Raw |
Responses |
Re: Proposal of tunable fix for scalability of 8.4
Re: Proposal of tunable fix for scalability of 8.4 |
List | pgsql-performance |
Hello All, As you know that one of the thing that constantly that I have been using benchmark kits to see how we can scale PostgreSQL on the UltraSPARC T2 based 1 socket (64 threads) and 2 socket (128 threads) servers that Sun sells. During last PgCon 2008 http://www.pgcon.org/2008/schedule/events/72.en.html you might remember that I mentioned that ProcArrayLock is pretty hot when you have many users. Rerunning similar tests on a 64-thread UltraSPARC T2plus based server config, I found that even with 8.4snap that I took I was still having similar problems (IO is not a problem... all in RAM .. no disks): Time:Users:Type:TPM: Response Time 60: 100: Medium Throughput: 10552.000 Avg Medium Resp: 0.006 120: 200: Medium Throughput: 22897.000 Avg Medium Resp: 0.006 180: 300: Medium Throughput: 33099.000 Avg Medium Resp: 0.009 240: 400: Medium Throughput: 44692.000 Avg Medium Resp: 0.007 300: 500: Medium Throughput: 56455.000 Avg Medium Resp: 0.007 360: 600: Medium Throughput: 67220.000 Avg Medium Resp: 0.008 420: 700: Medium Throughput: 77592.000 Avg Medium Resp: 0.009 480: 800: Medium Throughput: 87277.000 Avg Medium Resp: 0.011 540: 900: Medium Throughput: 98029.000 Avg Medium Resp: 0.012 600: 1000: Medium Throughput: 102547.000 Avg Medium Resp: 0.023 660: 1100: Medium Throughput: 100503.000 Avg Medium Resp: 0.044 720: 1200: Medium Throughput: 99506.000 Avg Medium Resp: 0.065 780: 1300: Medium Throughput: 95474.000 Avg Medium Resp: 0.089 840: 1400: Medium Throughput: 86254.000 Avg Medium Resp: 0.130 900: 1500: Medium Throughput: 91947.000 Avg Medium Resp: 0.139 960: 1600: Medium Throughput: 94838.000 Avg Medium Resp: 0.147 1020: 1700: Medium Throughput: 92446.000 Avg Medium Resp: 0.173 1080: 1800: Medium Throughput: 91032.000 Avg Medium Resp: 0.194 1140: 1900: Medium Throughput: 88236.000 Avg Medium Resp: 0.221 runDynamic: uCount = 2000delta = 1900 runDynamic: ALL Threads Have Been created 1200: 2000: Medium Throughput: -1352555.000 Avg Medium Resp: 0.071 1260: 2000: Medium Throughput: 88872.000 Avg Medium Resp: 0.238 1320: 2000: Medium Throughput: 88484.000 Avg Medium Resp: 0.248 1380: 2000: Medium Throughput: 90777.000 Avg Medium Resp: 0.231 1440: 2000: Medium Throughput: 90769.000 Avg Medium Resp: 0.229 You will notice that throughput drops around 1000 users.. Nothing new you have already heard me mention that zillion times.. Now while working on this today I was going through LWLockRelease like I have probably done quite a few times before to see what can be done.. The quick synopsis is that LWLockRelease releases the lock and wakes up the next waiter to take over and if the next waiter is waiting for exclusive then it only wakes that waiter up and if next waiter is waiting on shared then it goes through all shared waiters following and wakes them all up. Earlier last year I had tried various ways of doing intelligent waking up (finding all shared together and waking them up, coming up with a different lock type and waking multiple of them up simultaneously but ended up defining a new lock mode and of course none of them were stellar enough to make an impack.. Today I tried something else.. Forget the distinction of exclusive and shared and just wake them all up so I changed the code from /* * Remove the to-be-awakened PGPROCs from the queue. If the front * waiter wants exclusive lock, awaken him only. Otherwise awaken * as many waiters as want shared access. */ proc = head; if (!proc->lwExclusive) { while (proc->lwWaitLink != NULL && !proc->lwWaitLink->lwExclusive) proc = proc->lwWaitLink; } /* proc is now the last PGPROC to be released */ lock->head = proc->lwWaitLink; proc->lwWaitLink = NULL; /* prevent additional wakeups until retryer gets to run */ lock->releaseOK = false; to basically wake them all up: /* * Remove the to-be-awakened PGPROCs from the queue. If the front * waiter wants exclusive lock, awaken him only. Otherwise awaken * as many waiters as want shared access. */ proc = head; //if (!proc->lwExclusive) if (1) { while (proc->lwWaitLink != NULL && 1) // !proc->lwWaitLink->lwExclusive) proc = proc->lwWaitLink; } /* proc is now the last PGPROC to be released */ lock->head = proc->lwWaitLink; proc->lwWaitLink = NULL; /* prevent additional wakeups until retryer gets to run */ lock->releaseOK = false; Which basically wakes them all up and let them find (technically causing thundering herds what the original logic was trying to avoid) I reran the test and saw the results: Time:Users:Type:TPM: Response Time 60: 100: Medium Throughput: 10457.000 Avg Medium Resp: 0.006 120: 200: Medium Throughput: 22809.000 Avg Medium Resp: 0.006 180: 300: Medium Throughput: 33665.000 Avg Medium Resp: 0.008 240: 400: Medium Throughput: 45042.000 Avg Medium Resp: 0.006 300: 500: Medium Throughput: 56655.000 Avg Medium Resp: 0.007 360: 600: Medium Throughput: 67170.000 Avg Medium Resp: 0.007 420: 700: Medium Throughput: 78343.000 Avg Medium Resp: 0.008 480: 800: Medium Throughput: 87979.000 Avg Medium Resp: 0.008 540: 900: Medium Throughput: 100369.000 Avg Medium Resp: 0.008 600: 1000: Medium Throughput: 110697.000 Avg Medium Resp: 0.009 660: 1100: Medium Throughput: 121255.000 Avg Medium Resp: 0.010 720: 1200: Medium Throughput: 132915.000 Avg Medium Resp: 0.010 780: 1300: Medium Throughput: 141505.000 Avg Medium Resp: 0.012 840: 1400: Medium Throughput: 147084.000 Avg Medium Resp: 0.021 light: customer: No result set for custid 0 900: 1500: Medium Throughput: 157906.000 Avg Medium Resp: 0.018 light: customer: No result set for custid 0 960: 1600: Medium Throughput: 160289.000 Avg Medium Resp: 0.026 1020: 1700: Medium Throughput: 152191.000 Avg Medium Resp: 0.053 1080: 1800: Medium Throughput: 157949.000 Avg Medium Resp: 0.054 1140: 1900: Medium Throughput: 161923.000 Avg Medium Resp: 0.063 runDynamic: uCount = 2000delta = 1900 runDynamic: ALL Threads Have Been created 1200: 2000: Medium Throughput: -1781969.000 Avg Medium Resp: 0.019 light: customer: No result set for custid 0 1260: 2000: Medium Throughput: 140741.000 Avg Medium Resp: 0.115 light: customer: No result set for custid 0 1320: 2000: Medium Throughput: 165379.000 Avg Medium Resp: 0.070 1380: 2000: Medium Throughput: 166585.000 Avg Medium Resp: 0.070 1440: 2000: Medium Throughput: 169163.000 Avg Medium Resp: 0.063 1500: 2000: Medium Throughput: 157508.000 Avg Medium Resp: 0.086 light: customer: No result set for custid 0 1560: 2000: Medium Throughput: 170112.000 Avg Medium Resp: 0.063 An improvement of 1.89X in throughput and still not drastically dropping which means now I can go forward still stressing up PostgreSQL 8.4 to the limits of the box. My proposal is if we build a quick tunable for 8.4 wake-up-all-waiters=on (or something to that effect) in postgresql.conf before the beta then people can try the option and report back to see if that helps improve performance on various other benchmarks that people are running and collect feedback. This way it will be not intrusive so late in the game and also put an important scaling fix back in... Of course as usual this is open for debate.. I know avoiding thundering herd was the goal here.. but waking up 1 exclusive waiter who may not be even on CPU is pretty expensive from what I have seen till date. What do you all think ? Regards, Jignesh
pgsql-performance by date: