Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Improving connection scalability: GetSnapshotData() |
Date | |
Msg-id | 20200406133959.viql5fqecog6mppj@alap3.anarazel.de Whole thread Raw |
In response to | Re: Improving connection scalability: GetSnapshotData() (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
Responses |
Re: Improving connection scalability: GetSnapshotData()
Re: Improving connection scalability: GetSnapshotData() |
List | pgsql-hackers |
Hi, These benchmarks are on my workstation. The larger VM I used in the last round wasn't currently available. HW: 2 x Intel(R) Xeon(R) Gold 5215 (each 10 cores / 20 threads) 192GB Ram. data directory is on a Samsung SSD 970 PRO 1TB A bunch of terminals, emacs, mutt are open while the benchmark is running. No browser. Unless mentioned otherwise, relevant configuration options are: max_connections=1200 shared_buffers=8GB max_prepared_transactions=1000 synchronous_commit=local huge_pages=on fsync=off # to make it more likely to see scalability bottlenecks Independent of the effects of this patch (i.e. including master) I had a fairly hard time getting reproducible number for *low* client cases. I found the numbers to be more reproducible if I pinned server/pgbench onto the same core :(. I chose to do that for the -c1 cases, to benchmark the optimal behaviour, as that seemed to have the biggest potential for regressions. All numbers are best of three. Tests start in freshly created cluster each. On 2020-03-30 17:04:00 +0300, Alexander Korotkov wrote: > Following pgbench scripts comes first to my mind: > 1) SELECT txid_current(); (artificial but good for checking corner case) -M prepared -T 180 (did a few longer runs, but doesn't seem to matter much) clients tps master tps pgxact 1 46118 46027 16 377357 440233 40 373304 410142 198 103912 105579 btw, there's some pretty horrible cacheline bouncing in txid_current() because backends first ReadNextFullTransactionId() (acquires XidGenLock in shared mode, reads ShmemVariableCache->nextFullXid), then separately causes GetNewTransactionId() (acquires XidGenLock exclusively, reads & writes nextFullXid). With for fsync=off (and also for synchronous_commit=off) the numbers are, at lower client counts, severly depressed and variable due to walwriter going completely nuts (using more CPU than the backend doing the queries). Because WAL writes are so fast on my storage, individual XLogBackgroundFlush() calls are very quick. This leads to a *lot* of kill()s from the backend, from within XLogSetAsyncXactLSN(). There got to be a bug here. But unrelated. > 2) Single insert statement (as example of very short transaction) CREATE TABLE testinsert(c1 int not null, c2 int not null, c3 int not null, c4 int not null); INSERT INTO testinsert VALUES(1, 2, 3, 4); -M prepared -T 360 fsync on: clients tps master tps pgxact 1 653 658 16 5687 5668 40 14212 14229 198 60483 62420 fsync off: clients tps master tps pgxact 1 59356 59891 16 290626 299991 40 348210 355669 198 289182 291529 clients tps master tps pgxact 1024 47586 52135 -M simple fsync off: clients tps master tps pgxact 40 289077 326699 198 286011 299928 > 3) Plain pgbench read-write (you already did it for sure) -s 100 -M prepared -T 700 autovacuum=off, fsync on: clients tps master tps pgxact 1 474 479 16 4356 4476 40 8591 9309 198 20045 20261 1024 17986 18545 autovacuum=off, fsync off: clients tps master tps pgxact 1 7828 7719 16 49069 50482 40 68241 73081 198 73464 77801 1024 25621 28410 I chose autovacuum off because otherwise the results vary much more widely, and autovacuum isn't really needed for the workload. > 4) pgbench read-write script with increased amount of SELECTs. Repeat > select from pgbench_accounts say 10 times with different aids. I did intersperse all server-side statements in the script with two selects of other pgbench_account rows each. -s 100 -M prepared -T 700 autovacuum=off, fsync on: clients tps master tps pgxact 1 365 367 198 20065 21391 -s 1000 -M prepared -T 700 autovacuum=off, fsync on: clients tps master tps pgxact 16 2757 2880 40 4734 4996 198 16950 19998 1024 22423 24935 > 5) 10% pgbench read-write, 90% of pgbench read-only -s 100 -M prepared -T 100 -bselect-only@9 -btpcb-like@1 autovacuum=off, fsync on: clients tps master tps pgxact 16 37289 38656 40 81284 81260 198 189002 189357 1024 143986 164762 > > That definitely needs to be measured, due to the locking changes around procarrayaddd/remove. > > > > I don't think regressions besides perhaps 2pc are likely - there's nothing really getting more expensive but procarrayadd/remove. > > I agree that ProcArrayAdd()/Remove() should be first subject of > investigation, but other cases should be checked as well IMHO. I'm not sure I really see the point. If simple prepared tx doesn't show up as a negative difference, a more complex one won't either, since the ProcArrayAdd()/Remove() related bottlenecks will play smaller and smaller role. > Regarding 2pc I can following scenarios come to my mind: > 1) pgbench read-write modified so that every transaction is prepared > first, then commit prepared. The numbers here are -M simple, because I wanted to use PREPARE TRANSACTION 'ptx_:client_id'; COMMIT PREPARED 'ptx_:client_id'; -s 100 -M prepared -T 700 -f ~/tmp/pgbench-write-2pc.sql autovacuum=off, fsync on: clients tps master tps pgxact 1 251 249 16 2134 2174 40 3984 4089 198 6677 7522 1024 3641 3617 > 2) 10% of 2pc pgbench read-write, 90% normal pgbench read-write -s 100 -M prepared -T 100 -f ~/tmp/pgbench-write-2pc.sql@1 -btpcb-like@9 clients tps master tps pgxact 198 18625 18906 > 3) 10% of 2pc pgbench read-write, 90% normal pgbench read-only -s 100 -M prepared -T 100 -f ~/tmp/pgbench-write-2pc.sql@1 -bselect-only@9 clients tps master tps pgxact 198 84817 84350 I also benchmarked connection overhead, by using pgbench with -C executing SELECT 1. -T 10 clients tps master tps pgxact 1 572 587 16 2109 2140 40 2127 2136 198 2097 2129 1024 2101 2118 These numbers seem pretty decent to me. The regressions seem mostly within noise. The one possible exception to that is plain pgbench read/write with fsync=off and only a single session. I'll run more benchmarks around that tomorrow (but now it's 6am :(). Greetings, Andres Freund
pgsql-hackers by date: