Thread: Hardware advice for scalable warehouse db
Hi list, My employer will be donated a NetApp FAS 3040 SAN [1] and we want to run our warehouse DB on it. The pg9.0 DB currently comprises ~1.5TB of tables, 200GB of indexes, and grows ~5%/month. The DB is not update critical, but undergoes larger read and insert operations frequently. My employer is a university with little funds and we have to find a cheap way to scale for the next 3 years, so the SAN seems a good chance to us. We are now looking for the remaining server parts to maximize DB performance with costs <= $4000. I digged out the following configuration with the discount we receive from Dell: 1 x Intel Xeon X5670, 6C, 2.93GHz, 12M Cache 16 GB (4x4GB) Low Volt DDR3 1066Mhz PERC H700 SAS RAID controller 4 x 300 GB 10k SAS 6Gbps 2.5" in RAID 10 I was thinking to put the WAL and the indexes on the local disks, and the rest on the SAN. If funds allow, we might downgrade the disks to SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible). Any comments on the configuration? Any experiences with iSCSI vs. Fibre Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a cheap alternative how to connect as many as 16 x 2TB disks as DAS? Thanks so much! Best, Chris [1]: http://www.b2net.co.uk/netapp/fas3000.pdf
chris wrote: > My employer is a university with little funds and we have to find a > cheap way to scale for the next 3 years, so the SAN seems a good chance > to us. A SAN is rarely ever the cheapest way to scale anything; you're paying extra for reliability instead. > I was thinking to put the WAL and the indexes on the local disks, and > the rest on the SAN. If funds allow, we might downgrade the disks to > SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible). > If you want to keep the bulk of the data on the SAN, this is a reasonable way to go, performance-wise. But be aware that losing the WAL means your database is likely corrupted. That means that much of the reliability benefit of the SAN is lost in this configuration. > Any experiences with iSCSI vs. Fibre > Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a > cheap alternative how to connect as many as 16 x 2TB disks as DAS? > I've never heard anyone recommend iSCSI if you care at all about performance, while FC works fine for this sort of job. The physical dimensions of 3.5" drives makes getting 16 of them in one reasonably sized enclosure normally just out of reach. But a Dell PowerVault MD1000 will give you 15 x 2TB as inexpensively as possible in a single 3U space (well, as cheaply as you want to go--you might build your own giant box cheaper but I wouldn't recommend ). I've tested MD1000, MD1200, and MD1220 arrays before, and always gotten seriously good performance relative to the dollars spent with that series. Only one of these Dell storage arrays I've heard two disappointing results from (but not tested directly yet) is the MD3220. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
> 1 x Intel Xeon X5670, 6C, 2.93GHz, 12M Cache > 16 GB (4x4GB) Low Volt DDR3 1066Mhz > PERC H700 SAS RAID controller > 4 x 300 GB 10k SAS 6Gbps 2.5" in RAID 10 Apart from Gregs excellent recommendations. I would strongly suggest more memory. 16GB in 2011 is really on the low side. PG is using memory (either shared_buffers og OS cache) for keeping frequently accessed data in. Good recommendations are hard without knowledge of data and access-patterns, but 64, 128 and 256GB system are quite frequent when you have data that can't all be in memory at once. SAN's are nice, but I think you can buy a good DAS thing each year for just the support cost of a Netapp, but you might have gotten a really good deal there too. But you are getting a huge amount of advanced configuration features and potential ways of sharing and.. and .. just see the specs. .. and if you need those the SAN is a good way to go, but they do come with a huge pricetag. Jesper
On 7/15/2011 2:10 AM, Greg Smith wrote: > chris wrote: >> My employer is a university with little funds and we have to find a >> cheap way to scale for the next 3 years, so the SAN seems a good chance >> to us. > A SAN is rarely ever the cheapest way to scale anything; you're paying > extra for reliability instead. > > >> I was thinking to put the WAL and the indexes on the local disks, and >> the rest on the SAN. If funds allow, we might downgrade the disks to >> SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible). >> > If you want to keep the bulk of the data on the SAN, this is a > reasonable way to go, performance-wise. But be aware that losing the > WAL means your database is likely corrupted. That means that much of > the reliability benefit of the SAN is lost in this configuration. > > >> Any experiences with iSCSI vs. Fibre >> Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a >> cheap alternative how to connect as many as 16 x 2TB disks as DAS? >> > I've never heard anyone recommend iSCSI if you care at all about > performance, while FC works fine for this sort of job. The physical > dimensions of 3.5" drives makes getting 16 of them in one reasonably > sized enclosure normally just out of reach. But a Dell PowerVault > MD1000 will give you 15 x 2TB as inexpensively as possible in a single > 3U space (well, as cheaply as you want to go--you might build your own > giant box cheaper but I wouldn't recommend ). I'm curious what people think of these: http://www.pc-pitstop.com/sas_cables_enclosures/scsase166g.asp I currently have my database on two of these and for my purpose they seem to be fine and are quite a bit less expensive than the Dell MD1000. I actually have three more of the 3G versions with expanders for mass storage arrays (RAID0) and haven't had any issues with them in the three years I've had them. Bob
On Fri, Jul 15, 2011 at 12:34 AM, chris <chricki@gmx.net> wrote: > I was thinking to put the WAL and the indexes on the local disks, and > the rest on the SAN. If funds allow, we might downgrade the disks to > SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible). Just to add to the conversation, there's no real advantage to putting WAL on SSD. Indexes can benefit from them, but WAL is mosty seqwuential throughput and for that a pair of SATA 1TB drives at 7200RPM work just fine for most folks. For example, in one big server we're running we have 24 drives in a RAID-10 for the /data/base dir with 4 drives in a RAID-10 for pg_xlog, and those 4 drives tend to have the same io util % under iostat as the 24 drives under normal usage. It takes a special kind of load (lots of inserts happening in large transactions quickly) for the 4 drive RAID-10 to have more than 50% util ever.
On Fri, Jul 15, 2011 at 10:39 AM, Robert Schnabel <schnabelr@missouri.edu> wrote: > I'm curious what people think of these: > http://www.pc-pitstop.com/sas_cables_enclosures/scsase166g.asp > > I currently have my database on two of these and for my purpose they seem to > be fine and are quite a bit less expensive than the Dell MD1000. I actually > have three more of the 3G versions with expanders for mass storage arrays > (RAID0) and haven't had any issues with them in the three years I've had > them. I have a co-worker who's familiar with them and they seem a lot like the 16 drive units we use from Aberdeen, which fully outfitted with 15k SAS drives run $5k to $8k depending on the drives etc.
> Just to add to the conversation, there's no real advantage to putting > WAL on SSD. Indexes can benefit from them, but WAL is mosty > seqwuential throughput and for that a pair of SATA 1TB drives at > 7200RPM work just fine for most folks. Actually, there's a strong disadvantage to putting WAL on SSD. SSD is very prone to fragmentation if you're doing a lot of deleting and replacing files. I've implemented data warehouses where the database was on SSD but WAL was still on HDD. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Hi list, Thanks a lot for your very helpful feedback! > I've tested MD1000, MD1200, and MD1220 arrays before, and always gotten > seriously good performance relative to the dollars spent Great hint, but I'm afraid that's too expensive for us. But it's a great way to scale over the years, I'll keep that in mind. I had a look at other server vendors who offer 4U servers with slots for 16 disks for 4k in total (w/o disks), maybe that's an even cheaper/better solution for us. If you had the choice between 16 x 2TB SATA vs. a server with some SSDs for WAL/indexes and a SAN (with SATA disk) for data, what would you choose performance-wise? Again, thanks so much for your help. Best, Chris
On Fri, Jul 15, 2011 at 11:49 AM, chris r. <chricki@gmx.net> wrote: > Hi list, > > Thanks a lot for your very helpful feedback! > >> I've tested MD1000, MD1200, and MD1220 arrays before, and always gotten >> seriously good performance relative to the dollars spent > Great hint, but I'm afraid that's too expensive for us. But it's a great > way to scale over the years, I'll keep that in mind. > > I had a look at other server vendors who offer 4U servers with slots for > 16 disks for 4k in total (w/o disks), maybe that's an even > cheaper/better solution for us. If you had the choice between 16 x 2TB > SATA vs. a server with some SSDs for WAL/indexes and a SAN (with SATA > disk) for data, what would you choose performance-wise? > > Again, thanks so much for your help. > > Best, > Chris SATA drives can easily flip bits and postgres does not checksum data, so it will not automatically detect corruption for you. I would steer well clear of SATA unless you are going to be using a fs like ZFS which checksums data. I would hope that a SAN would detect this for you, but I have no idea. -- Rob Wultsch wultsch@gmail.com
On 7/14/11 11:34 PM, chris wrote: > Any comments on the configuration? Any experiences with iSCSI vs. Fibre > Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a > cheap alternative how to connect as many as 16 x 2TB disks as DAS? Here's the problem with iSCSI: on gigabit ethernet, your maximum possible throughput is 100mb/s, which means that your likely maximum database throughput (for a seq scan or vacuum, for example) is 30mb/s. That's about a third of what you can get with good internal RAID. While multichannel iSCSI is possible, it's hard to configure, and doesn't really allow you to spread a *single* request across multiple channels. So: go with fiber channel if you're using a SAN. iSCSI also has horrible lag times, but you don't care about that so much for DW. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Hi Chris,
A couple comments on the NetApp SAN.
We use NetApp, primarily with Fiber connectivity and FC drives. All of the Postgres files are located on the SAN and this configuration works well.
We have tried iSCSI, but performance his horrible. Same with SATA drives.
The SAN will definitely be more costly then local drives. It really depends on what your needs are.
The biggest benefit for me in using SAN is using the special features that it offers. We use snapshots and flex clones, which is a great way to backup and clone large databases.
Cheers,
Terry
A couple comments on the NetApp SAN.
We use NetApp, primarily with Fiber connectivity and FC drives. All of the Postgres files are located on the SAN and this configuration works well.
We have tried iSCSI, but performance his horrible. Same with SATA drives.
The SAN will definitely be more costly then local drives. It really depends on what your needs are.
The biggest benefit for me in using SAN is using the special features that it offers. We use snapshots and flex clones, which is a great way to backup and clone large databases.
Cheers,
Terry
On Thu, Jul 14, 2011 at 11:34 PM, chris <chricki@gmx.net> wrote:
Hi list,
My employer will be donated a NetApp FAS 3040 SAN [1] and we want to run
our warehouse DB on it. The pg9.0 DB currently comprises ~1.5TB of
tables, 200GB of indexes, and grows ~5%/month. The DB is not update
critical, but undergoes larger read and insert operations frequently.
My employer is a university with little funds and we have to find a
cheap way to scale for the next 3 years, so the SAN seems a good chance
to us. We are now looking for the remaining server parts to maximize DB
performance with costs <= $4000. I digged out the following
configuration with the discount we receive from Dell:
1 x Intel Xeon X5670, 6C, 2.93GHz, 12M Cache
16 GB (4x4GB) Low Volt DDR3 1066Mhz
PERC H700 SAS RAID controller
4 x 300 GB 10k SAS 6Gbps 2.5" in RAID 10
I was thinking to put the WAL and the indexes on the local disks, and
the rest on the SAN. If funds allow, we might downgrade the disks to
SATA and add a 50 GB SATA SSD for the WAL (SAS/SATA mixup not possible).
Any comments on the configuration? Any experiences with iSCSI vs. Fibre
Channel for SANs and PostgreSQL? If the SAN setup sucks, do you see a
cheap alternative how to connect as many as 16 x 2TB disks as DAS?
Thanks so much!
Best,
Chris
[1]: http://www.b2net.co.uk/netapp/fas3000.pdf
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance