Re: select count() out of memory - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: select count() out of memory |
Date | |
Msg-id | 200710271434.08061.aklaver@comcast.net Whole thread Raw |
In response to | Re: select count() out of memory (tfinneid@student.matnat.uio.no) |
Responses |
Re: select count() out of memory
|
List | pgsql-general |
On Friday 26 October 2007 8:56 am, tfinneid@student.matnat.uio.no wrote: > > Serious engineering does not imply perfect engineering, I have analyzed it > and made my tradeoffs. What you are forgetting here is that you clearly > dont understand the enire solution, So I will try to explain it again. And > if you still think its bonkers, the I urge you to come up with a solution > that works with the requirements. > > Every predefined X seconds (e.g. 3,6,9,12 etc ) a bunch of data arrives, > which must be stored by descrete time groups, e.g. second 3,6,9,12. The > data that arrives is approx 4MB per second, so in this case its 12MB. This > has to be processed by the server and written to the db, within 1 second. > There can be up to 5 writers at the same time. Within that same second, at > least 16 readers should be able to read all the data, *each*. Writers and > readers are only concerned with the latest data, i.e. data from the latest > time group, e.g. second 9. > This has to go on every predefined seconds for the next 6-12 weeks, > without stop, pause or any errors. These are the requirements. > > When I performed performance tests I found several unwanted effects from > several test scenarios. Here are the most important ones: > > - single large table, with indexes created when table is created. > - this leads to the performance of an insert degrading as more data is > added, when I get > to 1 billion rows it took 50 seconds to add the data. > > My lesson from this is that > - single inserts can never be efficient enough > - indexes cause linear performance drop as data volume increases > > So I tried a different approach, which would address both issues: > > - separate tables for each bulk of data > - use of bulk insert through jdbc COPY. > - add indexes to the newly create table after the copy is finished. > > My lesson from this is: > - insert take constant time, no matter how much data is in the base > - adding the indexes after insert takes constant time, i.e. some > milliseconds. > > From this I realised that using either single tables or partitions is the > way to go, since I only need to access the latest data, i.e. the newest > table, in normal situations. > > After thinking about it and discussing with this group, I found that using > partitions would be more practical for two reasons: > - changes to the parent table is automatically propagated to all child > tables, so the schema remains consistent and the server wont brake because > of differences in the tables. > - it is more maintainable to use "create with inheritance" sql in source > code than the entire ddl of the table. > > So now I have tested the server 24/7 for a week and a half, with 1 writer > and 16 readers writing all the mentioned data, and everything works fine. > Expect for the select on the parent table, which now runs out of memory. > Which in it self is not a problem since I will never use the parent table > in production in any case. > > regards > > tom I might be missing the point, but couldn't you do a Copy to a single table instead of multiple inserts and avoid the index overhead. -- Adrian Klaver aklaver@comcast.net
pgsql-general by date: