Home > mailing lists
Proposal of tunable fix for scalability of 8.4 - Mailing list pgsql-performance

From	Jignesh K. Shah
Subject	Proposal of tunable fix for scalability of 8.4
Date	March 11, 2009 18:48:21
Msg-id	49B824DD.7090302@sun.com Whole thread Raw
Responses	Re: Proposal of tunable fix for scalability of 8.4 Re: Proposal of tunable fix for scalability of 8.4
List	pgsql-performance
Tree view
Hello All,

As  you know that one of the thing that constantly that I have been
using benchmark kits to see how we can scale PostgreSQL on the
UltraSPARC T2 based 1 socket (64 threads) and 2 socket (128 threads)
servers that Sun sells.

During last PgCon 2008
http://www.pgcon.org/2008/schedule/events/72.en.html you might remember
that I mentioned that ProcArrayLock is pretty hot when you have many users.

Rerunning similar tests on a 64-thread UltraSPARC T2plus based server
config,  I found that even with 8.4snap that I took I was still having
similar problems (IO is not a problem... all in RAM .. no disks):
Time:Users:Type:TPM: Response Time
60: 100: Medium Throughput: 10552.000 Avg Medium Resp: 0.006
120: 200: Medium Throughput: 22897.000 Avg Medium Resp: 0.006
180: 300: Medium Throughput: 33099.000 Avg Medium Resp: 0.009
240: 400: Medium Throughput: 44692.000 Avg Medium Resp: 0.007
300: 500: Medium Throughput: 56455.000 Avg Medium Resp: 0.007
360: 600: Medium Throughput: 67220.000 Avg Medium Resp: 0.008
420: 700: Medium Throughput: 77592.000 Avg Medium Resp: 0.009
480: 800: Medium Throughput: 87277.000 Avg Medium Resp: 0.011
540: 900: Medium Throughput: 98029.000 Avg Medium Resp: 0.012
600: 1000: Medium Throughput: 102547.000 Avg Medium Resp: 0.023
660: 1100: Medium Throughput: 100503.000 Avg Medium Resp: 0.044
720: 1200: Medium Throughput: 99506.000 Avg Medium Resp: 0.065
780: 1300: Medium Throughput: 95474.000 Avg Medium Resp: 0.089
840: 1400: Medium Throughput: 86254.000 Avg Medium Resp: 0.130
900: 1500: Medium Throughput: 91947.000 Avg Medium Resp: 0.139
960: 1600: Medium Throughput: 94838.000 Avg Medium Resp: 0.147
1020: 1700: Medium Throughput: 92446.000 Avg Medium Resp: 0.173
1080: 1800: Medium Throughput: 91032.000 Avg Medium Resp: 0.194
1140: 1900: Medium Throughput: 88236.000 Avg Medium Resp: 0.221
 runDynamic: uCount =  2000delta = 1900
 runDynamic: ALL Threads Have Been created
1200: 2000: Medium Throughput: -1352555.000 Avg Medium Resp: 0.071
1260: 2000: Medium Throughput: 88872.000 Avg Medium Resp: 0.238
1320: 2000: Medium Throughput: 88484.000 Avg Medium Resp: 0.248
1380: 2000: Medium Throughput: 90777.000 Avg Medium Resp: 0.231
1440: 2000: Medium Throughput: 90769.000 Avg Medium Resp: 0.229

You will notice that throughput drops around 1000 users.. Nothing new
you have already heard me mention that zillion times..

Now while working on this today I was going through LWLockRelease like I
have probably done quite a few times before to see what can be done..
The quick synopsis is that LWLockRelease releases the lock and wakes up
the next waiter to take over and if the next waiter is waiting for
exclusive then it only wakes that waiter up  and if next waiter is
waiting on shared then it goes through all shared waiters following and
wakes them all up.

Earlier last  year I had tried various ways of doing intelligent waking
up (finding all shared together and waking them up, coming up with a
different lock type and waking multiple of them up simultaneously but
ended up defining a new lock mode and of course none of them were
stellar enough to make an impack..

Today I tried something else.. Forget the distinction of exclusive and
shared and just wake them all up so I changed the code from
                            /*
                            * Remove the to-be-awakened PGPROCs from the
queue.  If the front
                            * waiter wants exclusive lock, awaken him
only. Otherwise awaken
                            * as many waiters as want shared access.
                            */
                        proc = head;
                        if (!proc->lwExclusive)
                        {
                               while (proc->lwWaitLink != NULL &&
                                           !proc->lwWaitLink->lwExclusive)
                                   proc = proc->lwWaitLink;
                        }
                        /* proc is now the last PGPROC to be released */
                        lock->head = proc->lwWaitLink;
                        proc->lwWaitLink = NULL;
                        /* prevent additional wakeups until retryer gets
to run */
                        lock->releaseOK = false;


to basically wake them all up:
            /*
             * Remove the to-be-awakened PGPROCs from the queue.  If the
front
             * waiter wants exclusive lock, awaken him only. Otherwise
awaken
             * as many waiters as want shared access.
             */
                        proc = head;
            //if (!proc->lwExclusive)
            if (1)
            {
                             while (proc->lwWaitLink != NULL &&
                                          1)
                                           //
!proc->lwWaitLink->lwExclusive)
                                        proc = proc->lwWaitLink;
            }
                        /* proc is now the last PGPROC to be released */
            lock->head = proc->lwWaitLink;
                        proc->lwWaitLink = NULL;
                        /* prevent additional wakeups until retryer gets
to run */
                        lock->releaseOK = false;


Which basically wakes them all up and let them find (technically causing
thundering herds what the original logic was trying to avoid) I reran
the test and saw the results:

Time:Users:Type:TPM: Response Time
60: 100: Medium Throughput: 10457.000 Avg Medium Resp: 0.006
120: 200: Medium Throughput: 22809.000 Avg Medium Resp: 0.006
180: 300: Medium Throughput: 33665.000 Avg Medium Resp: 0.008
240: 400: Medium Throughput: 45042.000 Avg Medium Resp: 0.006
300: 500: Medium Throughput: 56655.000 Avg Medium Resp: 0.007
360: 600: Medium Throughput: 67170.000 Avg Medium Resp: 0.007
420: 700: Medium Throughput: 78343.000 Avg Medium Resp: 0.008
480: 800: Medium Throughput: 87979.000 Avg Medium Resp: 0.008
540: 900: Medium Throughput: 100369.000 Avg Medium Resp: 0.008
600: 1000: Medium Throughput: 110697.000 Avg Medium Resp: 0.009
660: 1100: Medium Throughput: 121255.000 Avg Medium Resp: 0.010
720: 1200: Medium Throughput: 132915.000 Avg Medium Resp: 0.010
780: 1300: Medium Throughput: 141505.000 Avg Medium Resp: 0.012
840: 1400: Medium Throughput: 147084.000 Avg Medium Resp: 0.021
light: customer: No result set for custid 0
900: 1500: Medium Throughput: 157906.000 Avg Medium Resp: 0.018
light: customer: No result set for custid 0
960: 1600: Medium Throughput: 160289.000 Avg Medium Resp: 0.026
1020: 1700: Medium Throughput: 152191.000 Avg Medium Resp: 0.053
1080: 1800: Medium Throughput: 157949.000 Avg Medium Resp: 0.054
1140: 1900: Medium Throughput: 161923.000 Avg Medium Resp: 0.063
 runDynamic: uCount =  2000delta = 1900
 runDynamic: ALL Threads Have Been created
1200: 2000: Medium Throughput: -1781969.000 Avg Medium Resp: 0.019
light: customer: No result set for custid 0
1260: 2000: Medium Throughput: 140741.000 Avg Medium Resp: 0.115
light: customer: No result set for custid 0
1320: 2000: Medium Throughput: 165379.000 Avg Medium Resp: 0.070
1380: 2000: Medium Throughput: 166585.000 Avg Medium Resp: 0.070
1440: 2000: Medium Throughput: 169163.000 Avg Medium Resp: 0.063
1500: 2000: Medium Throughput: 157508.000 Avg Medium Resp: 0.086
light: customer: No result set for custid 0
1560: 2000: Medium Throughput: 170112.000 Avg Medium Resp: 0.063

An improvement of 1.89X in throughput and still not drastically dropping
which means now I can go forward still stressing up PostgreSQL 8.4 to
the limits of the box.

My proposal is if we build a quick tunable for 8.4
wake-up-all-waiters=on (or something to that effect) in postgresql.conf
before the beta then people can try the option and report back to see if
that helps improve performance on various other benchmarks that people
are running and collect feedback. This way it will be not intrusive so
late in the game and also put an important scaling fix back in... Of
course as usual this is open for debate.. I know avoiding thundering
herd was the goal here.. but waking up 1 exclusive waiter who may not be
even on CPU is pretty expensive from what I have seen till date.

What do you all think ?

Regards,
Jignesh
pgsql-performance by date:
From: Tom Lane
Date: 11 March 2009, 17:46:21
Subject: Re: Full statement logging problematic on larger machines?
From: "Kevin Grittner"
Date: 11 March 2009, 19:27:24
Subject: Re: Proposal of tunable fix for scalability of 8.4
Proposal of tunable fix for scalability of 8.4 - Mailing list pgsql-performance

Previous

Next