Re: Intel SSDs that may not suck - Mailing list pgsql-performance
From | Greg Smith |
---|---|
Subject | Re: Intel SSDs that may not suck |
Date | |
Msg-id | 4D9D1FC3.4020207@2ndQuadrant.com Whole thread Raw |
In response to | Re: Intel SSDs that may not suck (Greg Smith <greg@2ndQuadrant.com>) |
Responses |
Re: Intel SSDs that may not suck
|
List | pgsql-performance |
Here's the new Intel 3rd generation 320 series drive: $ sudo smartctl -i /dev/sdc Device Model: INTEL SSDSA2CW120G3 Firmware Version: 4PC10302 User Capacity: 120,034,123,776 bytes ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Since I have to go chant at the unbelievers next week (MySQL Con), don't have time for a really thorough look here. But I made a first pass through my usual benchmarks without any surprises. bonnie++ meets expectations with 253MB/s reads, 147MB/s writes, and 3935 seeks/second: Version 1.03e ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP toy 32144M 147180 7 77644 3 253893 5 3935 15 Using sysbench to generate a 100GB file and randomly seek around it gives a similar figure: Extra file open flags: 0 100 files, 1Gb each 100Gb total file size Block size 8Kb Number of random requests for random IO: 10000 Read/Write ratio for combined random IO test: 1.50 Using synchronous I/O mode Doing random read test Threads started! Done. Operations performed: 10000 reads, 0 writes, 0 Other = 10000 Total Read 78.125Mb Written 0b Total transferred 78.125Mb (26.698Mb/sec) 3417.37 Requests/sec executed So that's the basic range of performance: up to 250MB/s on reads, but potentially as low as 3400 IOPS = 27MB/s on really random workloads. I can make it do worse than that as you'll see in a minute. At a database scale of 500, I can get 2357 TPS: postgres@toy:~$ /usr/lib/postgresql/8.4/bin/pgbench -c 64 -T 300 pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 500 query mode: simple number of clients: 64 duration: 300 s number of transactions actually processed: 707793 tps = 2357.497195 (including connections establishing) tps = 2357.943894 (excluding connections establishing) This is basically the same performance as the 4-disk setup with 256MB battery-backed write controller I profiled at http://www.2ndquadrant.us/pgbench-results/index.htm ; there XFS got as high as 2332 TPS, albeit with a PostgreSQL patched for better performance than I used here. This system has 16GB of RAM, so this is exercising write speed only without needing to read anything from disk; not too hard for regular drives to do. Performance holds at a scale of 1000 however: postgres@toy:~$ /usr/lib/postgresql/8.4/bin/pgbench -c 64 -T 300 -l pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1000 query mode: simple number of clients: 64 duration: 300 s number of transactions actually processed: 586043 tps = 1953.006031 (including connections establishing) tps = 1953.399065 (excluding connections establishing) Whereas my regular drives are lucky to hit 350 TPS here. So this is the typical sweet spot for SSD: workload is bigger than RAM, but not so much bigger than RAM that reads & writes become completely random. If I crank the scale way up, to 4000 = 58GB, now I'm solidly in seek-bound behavior, which does about twice as fast as my regular drive array here (that's around 200 TPS on this test): postgres@toy:~$ /usr/lib/postgresql/8.4/bin/pgbench -T 1800 -c 64 -l pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 4000 query mode: simple number of clients: 64 duration: 1800 s number of transactions actually processed: 731568 tps = 406.417254 (including connections establishing) tps = 406.430713 (excluding connections establishing) Here's a snapshot of typical drive activity when running this: avg-cpu: %user %nice %system %iowait %steal %idle 2.29 0.00 1.30 54.80 0.00 41.61 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sdc 0.00 676.67 443.63 884.00 7.90 12.25 31.09 41.77 31.45 0.75 99.93 So we're down to around 20MB/s, just as sysbench predicted a seek-bound workload would be on these drives. I can still see checkpoint spikes here where sync times go upward: 2011-04-06 20:40:58.969 EDT: LOG: checkpoint complete: wrote 2959 buffers (9.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=147.300 s, sync=32.885 s, total=181.758 s But the drive seems to never become unresponsive for longer than a second: postgres@toy:~$ cat pgbench_log.4585 | cut -d" " -f 6 | sort -n | tail 999941 999952 999956 999959 999960 999970 999977 999984 999992 999994 Power-plug pull tests with diskchecker.pl and a write-heavy database load didn't notice anything funny about the write cache: [witness] $ wget http://code.sixapart.com/svn/tools/trunk/diskchecker.pl $ chmod +x ./diskchecker.pl $ ./diskchecker.pl -l [server with SSD] $ wget http://code.sixapart.com/svn/tools/trunk/diskchecker.pl $ chmod +x ./diskchecker.pl $ diskchecker.pl -s grace create test_file 500 diskchecker: running 20 sec, 69.67% coverage of 500 MB (38456 writes; 1922/s) diskchecker: running 21 sec, 71.59% coverage of 500 MB (40551 writes; 1931/s) diskchecker: running 22 sec, 73.52% coverage of 500 MB (42771 writes; 1944/s) diskchecker: running 23 sec, 75.17% coverage of 500 MB (44925 writes; 1953/s) [pull plug] /home/gsmith/diskchecker.pl -s grace verify test_file verifying: 0.00% verifying: 0.73% verifying: 7.83% verifying: 14.98% verifying: 22.10% verifying: 29.23% verifying: 36.39% verifying: 43.50% verifying: 50.65% verifying: 57.70% verifying: 64.81% verifying: 71.86% verifying: 79.02% verifying: 86.11% verifying: 93.15% verifying: 100.00% Total errors: 0 2011-04-06 21:43:09.377 EDT: LOG: database system was interrupted; last known up at 2011-04-06 21:30:27 EDT 2011-04-06 21:43:09.392 EDT: LOG: database system was not properly shut down; automatic recovery in progress 2011-04-06 21:43:09.394 EDT: LOG: redo starts at 6/BF7B2880 2011-04-06 21:43:10.687 EDT: LOG: unexpected pageaddr 5/C2786000 in log file 6, segment 205, offset 7888896 2011-04-06 21:43:10.687 EDT: LOG: redo done at 6/CD784400 2011-04-06 21:43:10.687 EDT: LOG: last completed transaction was at log time 2011-04-06 21:39:00.551065-04 2011-04-06 21:43:10.705 EDT: LOG: checkpoint starting: end-of-recovery immediate 2011-04-06 21:43:14.766 EDT: LOG: checkpoint complete: wrote 29915 buffers (91.3%); 0 transaction log file(s) added, 0 removed, 106 recycled; write=0.146 s, sync=3.904 s, total=4.078 s 2011-04-06 21:43:14.777 EDT: LOG: database system is ready to accept connections So far, this drive is living up to expectations, without doing anything unexpected good or bad. When doing the things that SSD has the biggest advantage over mechanical drives, it's more than 5X as fast as a 4-disk array (3 disk DB + wal) with a BBWC. But on really huge workloads, where the worst-cast behavior of the drive is being hit, that falls to closer to a 2X advantage. And if you're doing work that isn't random much at all, the drive only matches regular disk. I like not having surprises in this sort of thing though. Intel 320 series gets a preliminary thumbs-up from me. I'll be happy when these are mainstream enough that I can finally exit the anti-Intel SSD pulpit I've been standing on the last two years. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
pgsql-performance by date: