Scale, Normalization, and Table Count - Mailing list pgsql-general
From | Tom Cross |
---|---|
Subject | Scale, Normalization, and Table Count |
Date | |
Msg-id | Pine.GSO.4.40.0210060210140.28070-100000@apogee.whack.org Whole thread Raw |
Responses |
Re: Scale, Normalization, and Table Count
Re: Scale, Normalization, and Table Count Re: Scale, Normalization, and Table Count |
List | pgsql-general |
Much to my horror I got an error message while attempting to back up my database tonight: pg_dump: NOTICE: ShmemAlloc: out of memory pg_dump: Attempt to lock table "thread_204" failed. ERROR: LockAcquire: lock table 1 is out of memory Obviously this is related to my shared memory allocations. I'm not sure how much shared memory my kernel is allowing. I'm running a rather default redhat 7.3 installation. I plan on looking into this. However, in investigating this problem I've found that it also seems to imply that I have too many tables in my database. I found a post in the postgres discussion boards about someone who encountered this who had 200,000 tables. He was told that this was too many, and that postgres was not designed to handle databases which were so poorly designed. While I can't imagine that I have more then 10,000 tables at this time, I think that as this system scales I will easily surpass 200,000. I don't think that this is due to poor database design, but rather the requirements of my application. However, standard disclaimer applies here. I'm an engineer, but I'm not a DB specialist. This is why I'm asking. Essentially, I have a discussion system. There could be millions of messages in the system. Lets pretend that there are 10 million for the sake of arguement. Of course, these messages are all on different topics. There are, lets say, 1000 messages on each topic. Users will only access one topic at a time. It seems intuitive that it would be less expensive to create an individual table for each topic then it would be to put all of the messages in a single table. If I can directly access the 1000 messages that I need by loading the table that they are in, I save myself from having to search a table which contains 10 million messages. In general, disk space is less expensive then memory or processor power, and so this seems to be a reasonable approach. I seems clear to me that there are some things I need to be careful about in pursuing this. 1. The table names, as you'll note from my error message, are automatically generated. There are limits to table name sizes, and I need to make sure that my database stays well above that limit. This is tunable in postgres, and a 128 character table name length, for example, could handle an astronomical number of table names... 2. I've learned this evening that there is a problem with shared memory allocation. There also appears to be a concern about the number of files in a UNIX directory. Right now the directory for this database has some 20,000 files in it. This seems to be easy to solve. I'll just create dynamically generated databases which include my dynamically generated tables. Each database will only be allowed, say, 5000 tables. (Is this too many?) I wonder what the limits on database names are... I haven't researched it. 3. At some point I'm clearly going to have to contend with the number of inodes available on my file system. This is also easy to solve, as I can put each database on its own partition. I think that it will eventually make sense, in fact, to put different databases on different database servers. But I have to wonder... Am I barking up the wrong tree here with all these dynamically generated tables? Is it really more reasonable to search a table with 10 million entries for the 1000 I want every time a user wants to read something? Are there other pitfalls that I need to be aware of? Is there a key book that discusses this sort of thing that I could have on my desk? Thanks, Tom Cross
pgsql-general by date: