Thread: Very large database
I need some help here. We need to implement a 180+GB database with 120+MB of updates every evening. Rather than purchasing the big iron, we would like to use postgres running over Linux 2.4.x as the data server. Is this even possible? Who has the largest database out there and what does it run on? How should we implement the disk array? Should we purchase a hardware RAID card or should we use the software RAID capabilities in Linux 2.4? Should we consider a SMP system? Should we use an outboard RAID box (like RaidZone)? If anyone out there has implemented a database of this size then I would like to correspond with you. Thanks for your help, Mike
Don't expect Postgresql to out perform Oracle. If Ocacle needs to "big iron" so will Postgresql. I've done some testing and found that's it's the number of transactions that matters more then the abount of data being dumped in. Our application was astronomy, I would batch load a few nights of observational data every few days. If all you are doing is loading data you can use COPY and it will move fast, Just a few minutes on even a low end machine. But, if that 120MB is in one million INSERTS each with lots of processing, contraint checks, index updates and so on then you will need some high end hardware to finish in only 24 hours. I wrote my application twice. The first version took __days__ to complete a run. My second version was 100x faster. I did much of the processing outside of the DBMS in standard "C" and then just COPYed the data in. So, the answer depends on what you need to do. Simply inputting that much data is easy. Also, how will it by used once it is in the database? Do you have many active users looking at it? What kind of seaches are they doing. In any case, SCSI drives are the way to go get a stack of them with a couple on-line spares. That and LOTS of RAM. At least 1GB as a minimum. Solaris has very good RAID support built in. I think better than Linux's. Both OSes are free although Solaris 8 will be the last PC version. Prototype you applacation with faked data then try a test where you pull out the power connection on a drive while the DBMS is updating data. Pulling the power should have NO effect if the RAID is set up right. Solaris found my spare drive and swapped it in automatically. Do this a few times before you depend on it. Likey either Solaris, Linux or BSD would work and pass this test. The big question is the transaction rate, table size is the second question. --- Michael Welter <mike@introspect.com> wrote: > I need some help here. We need to implement a 180+GB database with > 120+MB of updates every evening. Rather than purchasing the big > iron, > we would like to use postgres running over Linux 2.4.x as the data > server. Is this even possible? Who has the largest database out > there > and what does it run on? > > How should we implement the disk array? Should we purchase a > hardware > RAID card or should we use the software RAID capabilities in Linux > 2.4? > Should we consider a SMP system? Should we use an outboard RAID > box > (like RaidZone)? > > If anyone out there has implemented a database of this size then I > would > like to correspond with you. > > Thanks for your help, > Mike > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) ===== Chris Albertson Home: 310-376-1029 chrisalbertson90278@yahoo.com Cell: 310-990-7550 Office: 310-336-5189 Christopher.J.Albertson@aero.org __________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/
Not enough info. How many tables? Is the nightly run a bulk insert or update of data or more complicated than that? What sort of queries (quantity and complexity) is the database supposed to handle and what is the acceptable performance (how many simultaneous users, how many queries per second and what is the acceptable response time to a query)? Other things being equal, hardware RAID beats software RAID as you will keep the processor free for your programs. As a general rule, more spindles is better, more memory is better but the specifics of your project will point to the area of maximum benefit. If lots of queries will hit the same data, cache memory on the RAID card or external RAID subsystem will help. If you have lots of scattered writes, a RAID with a battery-backed cache that can safely optimize disk writes (ie., writes don't have to be sent to disk right away to protect them - they can be made when the disks are available) will help. In other words, depending on what you are trying to do you may need anything from a couple 100GB IDE in you Linux box to an external Winchester Systems Flash Disk. I can't speak with authority on the SMP issue but have run across items in the newsgroups that indicate that SMP performance in Postgresql needs work and you may be better off with a screaming single CPU machine. Don't overlook the effects of the on-chip cache size, bus and memory speeds. Given that you can get 4 70+GB IDE drives for not a huge investment, I'd start there and make a machine with software RAID. Do some testing and development in that environment and use the tools available to see if your bottlenecks seem more influenced by disk IO, memory, CPU or just what. Develop, test, experiment and you will be in a much better position to spec a production system. -Steve On Tuesday 08 January 2002 18:34, Michael Welter wrote: > I need some help here. We need to implement a 180+GB database with > 120+MB of updates every evening. Rather than purchasing the big iron, > we would like to use postgres running over Linux 2.4.x as the data > server. Is this even possible? Who has the largest database out there > and what does it run on? > > How should we implement the disk array? Should we purchase a hardware > RAID card or should we use the software RAID capabilities in Linux 2.4? > Should we consider a SMP system? Should we use an outboard RAID box > (like RaidZone)? > > If anyone out there has implemented a database of this size then I would > like to correspond with you. > > Thanks for your help, > Mike > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Hi Chris, Chris Albertson wrote: > <snip> > Solaris has very good RAID support built in. I think better > than Linux's. Both OSes are free although Solaris 8 will be the > last PC version. Where did you hear that Solaris 8 will be the last PC version? By "PC", do you mean "Intel Platform" version? ??? I half downloaded the "Intel Platform" cd's for the Solaris 9 Early Access program about a month ago, and now when I go to the Sun site the rest are not available. This makes me greatly concerned. :( Regards and best wishes, Justin Clift <snip> > --- Michael Welter <mike@introspect.com> wrote: > > I need some help here. We need to implement a 180+GB database with > > 120+MB of updates every evening. Rather than purchasing the big > > iron, > > we would like to use postgres running over Linux 2.4.x as the data > > server. Is this even possible? Who has the largest database out > > there > > and what does it run on? > > > > How should we implement the disk array? Should we purchase a > > hardware > > RAID card or should we use the software RAID capabilities in Linux > > 2.4? > > Should we consider a SMP system? Should we use an outboard RAID > > box > > (like RaidZone)? > > > > If anyone out there has implemented a database of this size then I > > would > > like to correspond with you. > > > > Thanks for your help, > > Mike > > > > > > > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 2: you can get off all lists at once with the unregister command > > (send "unregister YourEmailAddressHere" to > majordomo@postgresql.org) > > ===== > Chris Albertson > Home: 310-376-1029 chrisalbertson90278@yahoo.com > Cell: 310-990-7550 > Office: 310-336-5189 Christopher.J.Albertson@aero.org > > __________________________________________________ > Do You Yahoo!? > Send FREE video emails in Yahoo! Mail! > http://promo.yahoo.com/videomail/ > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
Justin Clift <justin@postgresql.org> writes: > Hi Chris, > > Chris Albertson wrote: > > > <snip> > > Solaris has very good RAID support built in. I think better > > than Linux's. Both OSes are free although Solaris 8 will be the > > last PC version. > > Where did you hear that Solaris 8 will be the last PC version? By "PC", > do you mean "Intel Platform" version? Yes. Sun dropped all support for x86. Slashdot covered it a few days ago. > I half downloaded the "Intel Platform" cd's for the Solaris 9 Early > Access program about a month ago, and now when I go to the Sun site the > rest are not available. > > This makes me greatly concerned. :( Good reason to use Free operating systems. ;) -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
Personally, I find the concept of Sun : a - Spending large investments of time and energy to make their products work both on Intel and Sparc b - Allowing and encouraging users to download Solaris Intel and Sparc for free, and quite a few products and extensions for them c - Recognising and then publishing that the vast majority of people who downloaded Solaris for free were downloading the Intel version ...quite braindamaged, when they then decide to cease the Intel version product line. All those people who downloaded the Intel version (the vast majority of 1.5 million people apparently) and thought it was good will remember this. Seems quite a lot of good effort to have expanded the market, then to just abandon it like that will leave a bad impression. Not just scaled down. Abandoned. Gone. Morte. Kaput. Personally, being a Solaris specialist (and using Sun OS's since '93) I feel quite let down. I've got a few PC's around running Solaris Intel for various things, so I'm unhappy. As they can pull the plug on an entire OS architecture this easily, even if they do bring it back at some point I'm not going to feel they're very trustworthy to not do it again. My recommending Solaris as a platform just stopped. :( Yep Doug. It's a good reason to use Free Operating Systems. + Justin Doug McNaught wrote: > > Justin Clift <justin@postgresql.org> writes: > > > Hi Chris, > > > > Chris Albertson wrote: > > > > > <snip> > > > Solaris has very good RAID support built in. I think better > > > than Linux's. Both OSes are free although Solaris 8 will be the > > > last PC version. > > > > Where did you hear that Solaris 8 will be the last PC version? By "PC", > > do you mean "Intel Platform" version? > > Yes. Sun dropped all support for x86. Slashdot covered it a few days > ago. > > > I half downloaded the "Intel Platform" cd's for the Solaris 9 Early > > Access program about a month ago, and now when I go to the Sun site the > > rest are not available. > > > > This makes me greatly concerned. :( > > Good reason to use Free operating systems. ;) > > -Doug > -- > Let us cross over the river, and rest under the shade of the trees. > --T. J. Jackson, 1863