Re: Speed up Clog Access by increasing CLOG buffers - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Speed up Clog Access by increasing CLOG buffers |
Date | |
Msg-id | CAA4eK1J12fSGAmFSeq0wdUgqD+4Ue43rZDr=ZEMbySMgxfGJKA@mail.gmail.com Whole thread Raw |
In response to | Re: Speed up Clog Access by increasing CLOG buffers (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Speed up Clog Access by increasing CLOG buffers
|
List | pgsql-hackers |
On Sat, Apr 2, 2016 at 5:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Mar 31, 2016 at 3:48 PM, Andres Freund <andres@anarazel.de> wrote:Here is the performance data (configuration of machine used to perform this test is mentioned at end of mail):Non-default parameters------------------------------------max_connections = 300shared_buffers=8GBmin_wal_size=10GBmax_wal_size=15GBcheckpoint_timeout =35minmaintenance_work_mem = 1GBcheckpoint_completion_target = 0.9wal_buffers = 256MBmedian of 3, 20-min pgbench tpc-b results for --unlogged-tables
I have ran exactly same test on intel x86 m/c and the results are as below:
Client Count/Patch_ver (tps) | 2 | 128 | 256 |
HEAD – Commit 2143f5e1 | 2832 | 35001 | 26756 |
clog_buf_128 | 2909 | 50685 | 40998 |
clog_buf_128 +group_update_clog_v8 | 2981 | 53043 | 50779 |
clog_buf_128 +content_lock | 2843 | 56261 | 54059 |
clog_buf_128 +nocontent_lock | 2630 | 56554 | 54429 |
In this m/c, I don't see any run-to-run variation, however the trend of results seems somewhat similar to power m/c. Clearly the first patch increasing clog bufs to 128 shows upto 50% performance improvement on 256 client-count. We can also observe that group clog patch gives ~24% gain on top of increase clog bufs patch at 256 client count. Both content lock and no content lock patches show similar performance gains and the performance is 6~7% better than group clog patch. Also as on power m/c, no content lock patch seems to show some regression at lower client count (2 clients in this case).
Based on above results, increase_clog_bufs to 128 is a clear winner and I think we might not want to proceed with no content lock approach patch as that shows some regression and also it is no better than using content lock approach patch. Now, I think we need to decide between group clog mode approach patch and use content lock approach patch, it seems to me that the difference between both of these is not high (6~7%) and I think that when there are sub-transactions involved (sub-transactions on same page as main transaction) group clog mode patch should give better performance as then content lock in itself will start becoming bottleneck. Now, I think we can address that case for content lock approach by using grouping technique on content lock or something similar, but not sure if that is worth the effort. Also, I see some variation in performance data with content lock patch on power m/c, but again that might be attributed to m/c characteristics. So, I think we can proceed with either group clog patch or content lock patch and if we want to proceed with content lock approach, then we need to do some more work on it.
Note - For both content and no content lock, I have applied 0001-Improve-64bit-atomics-support patch.
m/c config (lscpu)
---------------------------
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Vendor ID: GenuineIntel
CPU family: 6
Model: 47
Model name: Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz
Stepping: 2
CPU MHz: 1064.000
BogoMIPS: 4266.62
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 24576K
NUMA node0 CPU(s): 0,65-71,96-103
NUMA node1 CPU(s): 72-79,104-111
NUMA node2 CPU(s): 80-87,112-119
NUMA node3 CPU(s): 88-95,120-127
NUMA node4 CPU(s): 1-8,33-40
NUMA node5 CPU(s): 9-16,41-48
NUMA node6 CPU(s): 17-24,49-56
NUMA node7 CPU(s): 25-32,57-64
pgsql-hackers by date: