Thread: Insert Performance with WAL and Fsync
Hi All, I've posted this too admin with no responses - looks like I should have posted it too general - so here goes: Our application needs to do a moderate level of inserts into the database as it runs. We can hit 100-200 inserts a second under certain conditions - and it is not something we can batch up for later processing. While testing out postgres I noticed some troubling performance data under this sort of workload. I've got a run of pgbench which exhibits the behavior. This variant of pgbench is a good approximation of our application and the results mirror those of tests I did with our app - so changing the scaling, client factors, etc of pgbench is not relevant. I tried changing most other parameters of postmaster.conf (log files, buffers, etc) and none have such dramatic effects as illustrated below. It appears the CPU utilization on both machines is very low (<15%)- so I'm guessing it is mostly I/O overhead. At the bottom of the message is the detailed result of pg_bench -t 500 under the following conditions: wal_sync_method = open_sync wal_sync_method = fsync OS FSYNC OFF FSYNC ON FSYNC ON Solaris 74/75 22/22 30/30 Win2k 112/113 124/127 21/21 so the following questions arise: 1) How can I improve the Solaris performance without disabling fsync? 2) What are others getting for TPS on Solaris machines with fysnc on? Are my numbers low/high/right in the ballpark? 3) Why does the Solaris performance with fysnc on/off differ by a factor of 3.4x while the windows fsync on/off differs by only 1.1x? I thought WAL was supposed to dramatically reduce the cost of fsyncs? 4) Why does the Win2k behavior with fsync and open_sync differ so greatly? Is fysnc on cygwin slow or does OPEN_SYNC not work properly (i.e. is not really syncing) 5) Assuming the Win2k open_sync implementation is working why is my Win2k laptop with a much slower harddrive with fsync on 5.6x faster than the solaris box. 6) Is there anything I could have done in the building of postgres from source to cause problems on Solaris? Any help or insight would be greatly appreciated. Mike Here's the specs: Win2k Sp2 PIII 1.2G/512 mb Cache 512mb Ram 1 4400RPM IDE disk (its a laptop) Postgres 7.1.3 on cygwin postmaster.conf default execpt where noted below Solaris 8 01/01 with relatively recent patch sets E220R 2x450Mhz USII 2GB Ram 1 36Gig 10000RPM SCSI Disk Postgres 7.1.2 compiled using gcc postmaster.conf default execpt where noted below Here are the results of pg_bench: Solaris: wal_sync_method = open_sync fsync = false starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1 number of transactions per client: 500 number of transactions actually processed: 500/500 tps = 74.825791(including connections establishing) tps = 75.418729(excluding connections establishing) Solaris: wal_sync_method = open_sync fsync = true transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1 number of transactions per client: 500 number of transactions actually processed: 500/500 tps = 22.691402(including connections establishing) tps = 22.745163(excluding connections establishing) Solaris: wal_sync_method = fsync fsync = true transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1 number of transactions per client: 500 number of transactions actually processed: 500/500 tps = 30.489359(including connections establishing) tps = 30.591695(excluding connections establishing) Windows: wal_sync_method = open_sync fsync = true starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1 number of transactions per client: 500 number of transactions actually processed: 500/500 tps = 124.409521(including connections establishing) tps = 127.591055(excluding connections establishing) Windows: wal_sync_method = open_sync fsync = false transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1 number of transactions per client: 500 number of transactions actually processed: 500/500 tps = 112.103652(including connections establishing) tps = 113.250950(excluding connections establishing) wal_sync_method = fsync fsync = true scaling factor: 1 number of clients: 1 number of transactions per client: 500 number of transactions actually processed: 500/500 tps = 21.191655(including connections establishing) tps = 21.232517(excluding connections establishing)
Mike Schroepfer <mike@raplix.com> writes: > It appears the CPU utilization on both machines is very low (<15%)- so I'm > guessing it is mostly I/O overhead. It looks like your Solaris box is a dual CPU machine? PG 7.1.* suffers from pretty awful performance on multiprocessors, due to a rather braindead implementation of spinlocks. If vmstat shows that neither the CPUs nor the disks are working real hard, then I'd suspect this to be the problem --- cf http://archives.postgresql.org/pgsql-hackers/2002-01/msg00449.php and other recent pghackers threads. You might care to try your tests on current development sources (*not* 7.2b4; pull from CVS or use a nightly-snapshot tarball). I think we've improved the SMP performance considerably since 7.1, though more could probably be done in future. BTW, don't put a production database on current sources, there's at least two unpleasant known bugs. > 3) Why does the Solaris performance with fysnc on/off differ > by a factor of 3.4x while the windows fsync on/off differs > by only 1.1x? I thought WAL was supposed to dramatically > reduce the cost of fsyncs? > 4) Why does the Win2k behavior with fsync and open_sync differ > so greatly? Is fysnc on cygwin slow or does OPEN_SYNC > not work properly (i.e. is not really syncing) I don't know the innards of cygwin, but it would not surprise me in the least to hear that it doesn't implement fsync & OPEN_SYNC efficiently and/or correctly. It has to sit atop Windows, which probably doesn't have compatible APIs to support these behaviors reasonably. The results you show sure make it look like OPEN_SYNC is a no-op on cygwin... regards, tom lane
> It appears the CPU utilization on both machines is very low (<15%)- so I'm > guessing it is mostly I/O overhead. Are you seeing anything due to swapping, &c? What do memstat and friends tell you? > At the bottom of the message is the detailed result of pg_bench -t 500 under > the following conditions: > > wal_sync_method = open_sync wal_sync_method = fsync > OS FSYNC OFF FSYNC ON FSYNC ON > Solaris 74/75 22/22 30/30 > Win2k 112/113 124/127 21/21 We're using Solaris 7 and seeing considerably better performance than that (mind you' we've got some honkin' big hardware underneath, and a big RAID array with internal battery-backed smart caches, so I might not be the right person to ask). I note that you have IDE disks on your Win machine. Are you sure you're really getting an fsync? Win2000, AFAIR, is pretty aggressive about write-caching. I may be wrong though -- I'm no Windows guy. One thing I noticed is that the WAL commit_delay and siblings settings were a big help. My theory was WAL was costing us too much, because we had such volume that WAL became a bottleneck -- it was firing too quickly. My answer was to increase those settings; I noticed an immediate improvement. I had to increase the segments as well, in order to keep up; this takes slightly longer to recover in case of a crash, of course, but not so long as to make the difference worth worrying about. > 1 36Gig 10000RPM SCSI Disk I'd also worry about this 1-disk set-up. I'd be inclined to double the disk in order to put the WAL file on another spindle (or use RAID to speed things up, but that's a lot more disk). > Postgres 7.1.2 compiled using gcc There are a couple of issues that make it worth the upgrade to 7.1.3. See the archives. Nothing to help your perf. though, if I recall. If I can think of anything else, I'll let you know. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
Andrew, Thanks for the reply - responses below: > > It appears the CPU utilization on both machines is very low > (<15%)- so I'm > > guessing it is mostly I/O overhead. > > Are you seeing anything due to swapping, &c? What do memstat and > friends tell you? No swapping problems at all - lots of CPU and mem to go around (vmstat 5): procs memory page disk faults cpu r b w swap free re mf pi po fr de sr s0 s1 s6 -- in sy cs us sy id 0 0 0 2140608 1820520 2 15 1 1 1 0 0 2 0 0 0 430 130 71 1 1 99 0 0 0 2091848 1674728 21 65 0 0 0 0 0 67 0 0 0 806 2151 633 18 15 67 0 0 0 2091848 1673928 21 52 0 0 0 0 0 65 0 0 0 806 2081 611 20 15 64 0 0 0 2092472 1674008 12 19 0 8 8 0 0 39 0 0 0 634 1073 333 10 6 84 > We're using Solaris 7 and seeing considerably better performance than > that (mind you' we've got some honkin' big hardware underneath, and a > big RAID array with internal battery-backed smart caches, so I might > not be the right person to ask). What kind of numbers are you getting - so I can get an idea of where I am relative. > One thing I noticed is that the WAL commit_delay and siblings > settings were a big help. My theory was WAL was costing us too much, > because we had such volume that WAL became a bottleneck -- it was > firing too quickly. My answer was to increase those settings; I > noticed an immediate improvement. I had to increase the segments as > well, in order to keep up; this takes slightly longer to recover in > case of a crash, of course, but not so long as to make the difference > worth worrying about. Can you share your config? I've played with it without much help - I probably am missing something. > > 1 36Gig 10000RPM SCSI Disk > > I'd also worry about this 1-disk set-up. I'd be inclined to double > the disk in order to put the WAL file on another spindle (or use RAID > to speed things up, but that's a lot more disk). Yeah - this is not a production system - I'm trying to do some testing to understand our app's characteristics and hardware requirements. I certainly would at least mirror for safety and would love to get the wal on a another spindle. > > Postgres 7.1.2 compiled using gcc > > There are a couple of issues that make it worth the upgrade to 7.1.3. > See the archives. Nothing to help your perf. though, if I recall. Hmm - looked through the logs a good while ago and didn't see anything that I was too concerned about. I'll double check tho - thanks! > If I can think of anything else, I'll let you know. Appreciate your help! Mike > > A > -- > ---- > Andrew Sullivan 87 Mowat Avenue > Liberty RMS Toronto, Ontario Canada > <andrew@libertyrms.info> M6K 3E3 > +1 416 646 3304 x110 > > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
Tom, Thanks for the prompt reply. I have some more info for you: Results: a.1) 69tps 7.1.2 (same build as before) on a Uniprocessor Ultra 10 Box with fysnc a.2) 52tps 7.1.2 (same build as before) on a Uniprocessor Ultra 10 Box without fysnc b.1) 32tps 7.1.2 on the dual processor box again b.2) 31tps 7.2 tip of cvs on the dual processor box b.3) Output of vmstat 5 during b.1 So 7.2 doesn't appear to help. For fun I also disabled one of the processors and re-ran the tests. Overall the numbers went down not up. So I do not think it is the locking/SMP problem. Any other thoughts? I'm happy to run tests/collect more info as it helps. The other curious thing is that my little Ultra 10 is also beating the 220R. Can anyone else chime in with comparative numbers so I know how good/bad these are ? Detailed results below. Thanks! Mike a.1) transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1 number of transactions per client: 500 number of transactions actually processed: 500/500 tps = 69.109284(including connections establishing) tps = 69.580949(excluding connections establishing) a.2) scaling factor: 1 number of clients: 1 number of transactions per client: 500 number of transactions actually processed: 500/500 tps = 52.001209(including connections establishing) tps = 52.283831(excluding connections establishing) b.1) transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1 number of transactions per client: 500 number of transactions actually processed: 500/500 tps = 32.517491(including connections establishing) tps = 32.635141(excluding connections establishing) b.2) scaling factor: 1 number of clients: 1 number of transactions per client: 500 number of transactions actually processed: 500/500 tps = 31.648947(including connections establishing) tps = 31.723237(excluding connections establishing) b.3) output of vmstat 5 during b.1: procs memory page disk faults cpu r b w swap free re mf pi po fr de sr s0 s1 s6 -- in sy cs us sy id 0 0 0 2140760 1820952 2 15 1 1 1 0 0 2 0 0 0 430 130 71 1 1 99 0 0 0 2093696 1690392 15 128 0 91 91 0 0 128 0 0 0 1232 1886 894 11 3 86 0 0 0 2093440 1690048 5 0 0 0 0 0 0 168 0 0 0 1475 2213 1181 17 4 79 0 0 0 2093440 1689824 3 0 0 1 1 0 0 167 0 0 0 1462 2228 1169 20 3 77 0 0 0 2094472 1690568 27 42 0 12 12 0 0 36 0 0 0 630 646 286 4 1 94 > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Thursday, January 10, 2002 1:42 PM > To: Mike Schroepfer > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Insert Performance with WAL and Fsync > > > Mike Schroepfer <mike@raplix.com> writes: > > It appears the CPU utilization on both machines is very low > (<15%)- so I'm > > guessing it is mostly I/O overhead. > > It looks like your Solaris box is a dual CPU machine? PG > 7.1.* suffers > from pretty awful performance on multiprocessors, due to a rather > braindead implementation of spinlocks. If vmstat shows that > neither the > CPUs nor the disks are working real hard, then I'd suspect this to be > the problem --- cf > http://archives.postgresql.org/pgsql-hackers/2002-01/msg00449.php > and other recent pghackers threads. > > You might care to try your tests on current development sources (*not* > 7.2b4; pull from CVS or use a nightly-snapshot tarball). I > think we've > improved the SMP performance considerably since 7.1, though more could > probably be done in future. BTW, don't put a production database on > current sources, there's at least two unpleasant known bugs. > > > 3) Why does the Solaris performance with fysnc on/off differ > > by a factor of 3.4x while the windows fsync on/off differs > > by only 1.1x? I thought WAL was supposed to dramatically > > reduce the cost of fsyncs? > > 4) Why does the Win2k behavior with fsync and open_sync differ > > so greatly? Is fysnc on cygwin slow or does OPEN_SYNC > > not work properly (i.e. is not really syncing) > > I don't know the innards of cygwin, but it would not surprise > me in the > least to hear that it doesn't implement fsync & OPEN_SYNC efficiently > and/or correctly. It has to sit atop Windows, which probably doesn't > have compatible APIs to support these behaviors reasonably. > The results > you show sure make it look like OPEN_SYNC is a no-op on cygwin... > > regards, tom lane >
Mike Schroepfer <mike@raplix.com> writes: > b.1) 32tps 7.1.2 on the dual processor box again > b.2) 31tps 7.2 tip of cvs on the dual processor box > So 7.2 doesn't appear to help. Mmm, yeah, I missed that you were only looking at the single-client case; so the extra CPU has nothing to do anyway, and there's no spinlock contention to cause a problem. Most of the work we've been doing for 7.2 was concerned with improving multi-client performance. But still, seems odd that you can't do better than 32 tps on this setup. > b.3) output of vmstat 5 during b.1: > procs memory page disk faults cpu > r b w swap free re mf pi po fr de sr s0 s1 s6 -- in sy cs us sy > id > 0 0 0 2140760 1820952 2 15 1 1 1 0 0 2 0 0 0 430 130 71 1 1 > 99 > 0 0 0 2093696 1690392 15 128 0 91 91 0 0 128 0 0 0 1232 1886 894 11 3 > 86 What are the "disk" numbers in this variant of vmstat? Given the low CPU usage, one would hope that your disk is going flat out ... but if 128 is page transfers per second, it's not very impressive... regards, tom lane
> > b.3) output of vmstat 5 during b.1: > > > procs memory page disk > faults cpu > > r b w swap free re mf pi po fr de sr s0 s1 s6 -- in > sy cs us sy > > id > > 0 0 0 2140760 1820952 2 15 1 1 1 0 0 2 0 0 0 430 > 130 71 1 1 > > 99 > > 0 0 0 2093696 1690392 15 128 0 91 91 0 0 128 0 0 0 1232 > 1886 894 11 3 > > 86 > > What are the "disk" numbers in this variant of vmstat? Given the low > CPU usage, one would hope that your disk is going flat out ... but if > 128 is page transfers per second, it's not very impressive... here's iostat: Its 184.8 io/s (8kpages) = 1.1mb/s - 92% busy. Not too impressive. I'm following up too see if something is misconfigured. device r/s w/s kr/s kw/s wait actv svc_t %w %b tin tout us sy wt id sd0 0.0 184.6 0.0 1145.3 0.0 1.7 9.0 0 92 0 116 13 3 20 64 Thanks again, Mike