Re: improvement suggestions for performance design - Mailing list pgsql-performance
From | Heikki Linnakangas |
---|---|
Subject | Re: improvement suggestions for performance design |
Date | |
Msg-id | 468CEB32.30109@enterprisedb.com Whole thread Raw |
In response to | improvement suggestions for performance design (tfinneid@ifi.uio.no) |
Responses |
Re: improvement suggestions for performance design
|
List | pgsql-performance |
I would strongly suggest that you use a proper relational schema, instead of storing everything in two tables. I don't know your application, but a schema like that is called an Entity-Attribute-Value (though your entity seems to be just posx and posy) and it should raise a big red flag in the mind of any database designer. In particular, constructing queries against an EAV schema is a major pain in the ass. This has been discussed before on postgresql lists as well, you might want to search and read the previous discussions. Ignoring the EAV issue for a moment, it's hard to give advice without knowing what kind of queries are going to executed. Are the lookups always going to be by id? By posx/posy perhaps? By attribute? tfinneid@ifi.uio.no wrote: > Hi > > I have the following scenario for a database that I need to design, and > would like some hints on what to improve or do differently to achieve the > desired performance goal, disregarding hardware and postgres tuning. > > The premise is an attribute database that stores about 100 different > attribute types as attribute values. Every X seconds, Y number of new > attribute values are stored in the database. X is constant and currently > between 6 and 20 seconds, depending on the setup. In the future X could > become as low as 3 seconds. Y can, within the next 5-10 years, become as > high as 200 000. > > That means that for example, every 6 seconds 100 000 attributes needs to > be written to the database. > > At the same time, somewhere between 5-20 users needs to read parts of > those newly written attributes, maybe in total 30 000 attributes. > > This continues for the duration of the field operation, which could be > 18hrs a day for 6 weeks. So the total db size is up towards 200 gigs. > > Now here is how I suggest doing this: > > 1- the tables > > table attribute_values: > id int > attr_type int ( references attribute_types(id) ) > posX int > posY int > data_type int > value varchar(50) > > table attribute_types: > id int > name varchar(200); > > > > 2- function > > a function that receives an array of data and inserts each attribute. > perhaps one array per attribute data (type, posX, posY, data_type, > value) so five arrays as in parameters ot the function > > 3- java client > > the client receives the data from a corba request, and splits it > into, say 4 equally sized blocks and executes 4 threads that insert > each block (this seems to be more efficient than just using one > thread.) > > Now I am wondering if this is the most efficient way of doing it? > > - I know that I could group the attributes so that each type of attribute > gets its own table with all attributes in one row. But I am not sure if > that is any more efficient than ont attribute per row since I pass > everything to the function as an array. > With the above design a change in attribute types only requires changing > the data in a table instead of having to modify the client, the function > and the tables. > > - I am also wondering if writing the client and function in C would create > a more efficient solution. > > any comments? > > ps, I am currently running postgres 8.1, but could probably use 8.2 if it > is needed for functionality or performance reasons. It will run on a sparc > machine with solaris 10 and perhaps 4-6 processors, as many GB of RAM as > necessary and SCSI disks ( perhaps in raid 0 ). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-performance by date: