Thread: inheritance vs performance
Hi, I'm wondering if there could be problems related to inheritance in the following scenario (with PostgreSQL 7.4.1)... 1 A-table, abstract. Max 10 B-tables that inherit from A, with sometimes some more columns than A. These are also abstracts. "n" C-tables that inherit from 1 B-table, without more columns. Each C-table could contain quite a lot of rows (500K, 1M, ...). Could there be problems, or performance issues, related to inheritance if there is "too much" C-tables (in combination with the number of rows)? And what would be that "too much"? Remarks: A-table could be removed as it's not that important/relevant. The purpose of this structure is not to be able to easily select through the parent in all children tables, though it would be appreciated. The purpose of this is just to be able to easily create C-tables, and maybe also to easily handle structure changes of A or B-tables. The master words here are "performance" and "reliability". Thanks, Pascal
On Friday 13 February 2004 09:01, Pascal Polleunus wrote: > Hi, > > I'm wondering if there could be problems related to inheritance in the > following scenario (with PostgreSQL 7.4.1)... > > 1 A-table, abstract. > > Max 10 B-tables that inherit from A, with sometimes some more columns > than A. These are also abstracts. > > "n" C-tables that inherit from 1 B-table, without more columns. > Each C-table could contain quite a lot of rows (500K, 1M, ...). What is the point of having multiple C tables with the same structure? > Could there be problems, or performance issues, related to inheritance > if there is "too much" C-tables (in combination with the number of > rows)? And what would be that "too much"? Well, thousands of tables is probably "too much", but a hundred tables or two in a database shouldn't cause problems. Don't see why you'd want them though. > Remarks: > A-table could be removed as it's not that important/relevant. > The purpose of this structure is not to be able to easily select through > the parent in all children tables, though it would be appreciated. > The purpose of this is just to be able to easily create C-tables, and > maybe also to easily handle structure changes of A or B-tables. I don't see how inheritance makes it easier to create C tables. > The master words here are "performance" and "reliability". Don't see how either of these are affected by what you're talking about doing here. Can you explain more closely what it is you're trying to do? -- Richard Huxton Archonet Ltd
> Well, thousands of tables is probably "too much", but a hundred tables or two > in a database shouldn't cause problems. Don't see why you'd want them though. If that's your general advice (a hundred or more tables in a database not making sense) I should like to learn why. Is that a sure sign of overdesign ? Excess normalization ? Bad separation of duty ? I am asking since our schema is at about 200 relations and growing. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Friday 13 February 2004 10:59, Karsten Hilbert wrote: > > Well, thousands of tables is probably "too much", but a hundred tables or > > two in a database shouldn't cause problems. Don't see why you'd want them > > though. > > If that's your general advice (a hundred or more tables in a > database not making sense) I should like to learn why. Is that > a sure sign of overdesign ? Excess normalization ? Bad > separation of duty ? I am asking since our schema is at > about 200 relations and growing. The original mail mentioned many "C tables" all with the same columns. Obviously you need as many different tables as required to model your data, but many tables all with identical schema? -- Richard Huxton Archonet Ltd
Hi Pascal, As other answers to this topic pointed out, it's kind of pointless to use more tables with the same structure. In the long run it will become a PITA to manage them, I'm talking from experience here. In our company we adopted a solution with dynamically created tables (with dynamic schema), thinking it would be more performant (which actually might be true in our case). The alternative would have been some kind of generic "param_name", "param_value" table, holding all the data from all these dynamic tables. While performance might have been gained using the dynamic tables, a lot of flexibility was lost, and a maintainance nightmare was created (just think about migrating all those tables between versions of the system). Not to mention that you can't easily create queries which have as parameter a table name... (actually you can, but I think it's not really recommended). In our case however the schema is different for all those tables, so it makes sense in a way, but from maintainance POV I wouldn't chose again dynamic tables, they are more trouble than worth. Anyway, I think you would be better off by adding an additional column to your B tables which holds the "table name" the C tables would have had. In fact that would be an ID of some sort for efficiency reasons (and I bet you already have those IDs there ;-). Then you can select the content of a C table based on those IDs, and have a lot more flexibility. Performance wise I think the one table solution is actually better, but that's just a guess from my part. And also reconsider using separate B tables if that means they are dynamically created... or be prepared for some hard times later with maintainance ;-) Just my 2c, Csaba. On Fri, 2004-02-13 at 10:01, Pascal Polleunus wrote: > Hi, > > I'm wondering if there could be problems related to inheritance in the > following scenario (with PostgreSQL 7.4.1)... > > 1 A-table, abstract. > > Max 10 B-tables that inherit from A, with sometimes some more columns > than A. These are also abstracts. > > "n" C-tables that inherit from 1 B-table, without more columns. > Each C-table could contain quite a lot of rows (500K, 1M, ...). > > Could there be problems, or performance issues, related to inheritance > if there is "too much" C-tables (in combination with the number of > rows)? And what would be that "too much"? > > Remarks: > A-table could be removed as it's not that important/relevant. > The purpose of this structure is not to be able to easily select through > the parent in all children tables, though it would be appreciated. > The purpose of this is just to be able to easily create C-tables, and > maybe also to easily handle structure changes of A or B-tables. > The master words here are "performance" and "reliability". > > > Thanks, > Pascal > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html
On Fri, Feb 13, 2004 at 01:51:24PM +0000, Richard Huxton wrote: > On Friday 13 February 2004 10:59, Karsten Hilbert wrote: > > > Well, thousands of tables is probably "too much", but a hundred tables or > > > two in a database shouldn't cause problems. Don't see why you'd want them > > > though. > > > > If that's your general advice (a hundred or more tables in a > > database not making sense) I should like to learn why. Is that > > a sure sign of overdesign ? Excess normalization ? Bad > > separation of duty ? I am asking since our schema is at > > about 200 relations and growing. > > The original mail mentioned many "C tables" all with the same columns. > Obviously you need as many different tables as required to model your data, > but many tables all with identical schema? Poor mans tablespaces. It's a trick I've had to resort to a few times when on a write-heavy steady-state system there's just not enough I/O bandwidth to delete and vacuum old data, or not enough to maintain an index. Segregate the incoming data by, say, day and put one days worth of data into each 'C' table. At the end of each day, index the days table. If you're maintaining six months of data, drop the 180th table. If most of the queries on the data are constrained by date it's reasonably efficient to search too. And if you have rare queries which aren't constrained by date you can just apply them to the parent table - not terribly efficient, but quite workable. Hideous hack, but it works. Cheers, Steve
Hi, Here's a deeper explanation about what I'm trying to achieve... > I'm wondering if there could be problems related to inheritance in the > following scenario (with PostgreSQL 7.4.1)... In fact, the main concern is not really about inheritance but more about how to handle large amounts of data. > 1 A-table, abstract. A-table contains the common columns for each type of customer. > Max 10 B-tables that inherit from A, with sometimes some more columns > than A. These are also abstracts. A B-table is created for each type of customer, some of them need more columns (currently only 3 types, but maybe some more that's why I said max 10. These 3 types have a different structure). > "n" C-tables that inherit from 1 B-table, without more columns. > Each C-table could contain quite a lot of rows (500K, 1M, ...). A C-table is created for each customer, that inherits from the B-table of their "customer type". I hope their will be some hundreds of customers/C-tables. My concern is that each C-table would contain around 500K records per year... and I hope more ;-) (1M or more was probably targeting too high) > Could there be problems, or performance issues, related to inheritance > if there is "too much" C-tables (in combination with the number of > rows)? And what would be that "too much"? Let's take for example 100 customers with 500K records per year. Grouping them together will lead to have, at the end of the year, 50M records in a single table. Isn't that too much? A solution could be to broken out the data per month and to keep only the current & previous months... 50M / 6 = 8.3M of records at the end of each month. Or to do the same weekly, so 50M / 26 = 1.9M records at the end of each week, for the last 2 weeks. With these data, I need to be able to: - generate daily, monthly and yearly reports per customer. - providing a list of records per customer for a given period (for a given day could be enough). Inheritance is maybe not really what is needed here. To handle table creation, I could store the structures somewhere instead of using inheritance. And to handle hypothetical structure changes, I could create bulk procedures. > Remarks: > A-table could be removed as it's not that important/relevant. > The purpose of this structure is not to be able to easily select through > the parent in all children tables, though it would be appreciated. I don't really need to do cross-customer queries, though that would be appreciable for generating global reports. Without inheritance, how to handle that will depend on the reports that need to be generated. > The purpose of this is just to be able to easily create C-tables, and > maybe also to easily handle structure changes of A or B-tables. Sorry, that was stupid :-/ What I wanted to say is that the purpose was mainly to distribute the amount of data between several tables. And, secondly, to easily handle structure changes. Thanks again for your advice, Pascal