Re: One large v. many small - Mailing list pgsql-performance
From | Josh Berkus |
---|---|
Subject | Re: One large v. many small |
Date | |
Msg-id | 200301300956.56041.josh@agliodbs.com Whole thread Raw |
In response to | One large v. many small (Noah Silverman <noah@allresearch.com>) |
Responses |
Re: One large v. many small
Re: One large v. many small |
List | pgsql-performance |
Noah, > As we continue our evaluation of Postgres, another interesting topic > has come up that I want to run by the group. > > In our current model, we have about 3,000 small tables that we use > track data for our clients. Each table is an identical structure, and > holds the data for one client. I'd list what's wrong with this structure, but frankly it would take me long enough that I'd need a consulting fee. Suffice it to say that the above is a very, very bad (or at least antiquated) design idea and you need to transition out of it as soon as possible. > Another idea that we are considering is one big table instead of 3,000 > smaller ones. We could simply add a numeric field to indicate which > client a particular record was for. Yes. Absolutely. Although I'd suggest an Integer field. > Each table has between 500 and 50,000 records, so the big table could > have up to 10 million rows if we combined everything. Sure. > A query on our current system is (for client #4) > > Select (*) from client_4 where foo=2; > > A query from the new, proposed system would be > > Select (*) from big_results where client=4 and foo=2. > > The big questions is, WHICH WILL BE FASTER with Postgres. Is there any > performance improvement or cost to switching to this new structure. Oh, no question query 1 will be faster ... FOR THAT QUERY. You are asking the wrong question. However, explain to me how, under the current system, you can find the client who ordered $3000 worth of widgets on January 12th if you don't already know who it is? I'm not sure a 3000-table UNION query is even *possible*. Or how about giving me the average number of customer transactions in a month, across all clients? <rant> You've enslaved your application design to performance considerations ... an approach which was valid in 1990, because processing power was so limited then. But now that dual-processor servers with RAID can be had for less than $3000, there's simply no excuse for violating the principles of good relational database design just to speed up a query. Buying more RAM is much cheaper than having an engineer spend 3 weeks fixing data integrity problems. The proper way to go about application design is to build your application on paper or in a modelling program according to the best principles of software design available, and *then* to discuss performance issues -- addressing them *first* by buying hardware, and only compromising your applcation design when no other alternative is available. </rant> I strongly suggest that you purchase Pascal's "Practical Issues in Database Design" and give it a read. -- Josh Berkus Aglio Database Solutions San Francisco
pgsql-performance by date: