Re: Grave performance issues... - Mailing list pgsql-general
From | steve boyle |
---|---|
Subject | Re: Grave performance issues... |
Date | |
Msg-id | a1fdi7$1r8f$1@news.tht.net Whole thread Raw |
In response to | Grave performance issues... ("Ztream" <ztream@highrad.org>) |
Responses |
Stored proceures in perl or Python
|
List | pgsql-general |
I might be missing something but have you considered either using a trigger to update the Distribution table each time the data table is amended OR wrap updates to the Distribution table using a function i.e. f_add_data(....) that would keep the two tables in sync. I think this would probably do away with the need to carry out the background process altogether. Also are you using the GroupMember table in the update statement? hih sb "Ztream" <ztream@highrad.org> wrote in message news:a0fr1a$1eil$1@news.tht.net... > I am porting a system that I have been developing on an MS platform > (including SQL Server) to a linux system with postgresql. Apart from the > usual difficulties (and not being very knowledgeable in the unix world), I > have now gotten most things to work. However, I am experiencing some pretty > serious performance issues. > The system collects data from users thru a web interface, and also has a > background process that does rather heavy work. I will here refer to but one > of the SQL statements, to keep things focused. The idea here is that the > background process is, among other things, constructing/updating a table > called Distribution from a table called Data, which then represents the > statistic distribution of the information found in Data (using user groups, > too). > The update-statement actually performing this looks like this: > > UPDATE Distribution > SET Value = > ( > SELECT COUNT(*) FROM Data INNER JOIN GroupMember ON Data.UserID = > GroupMember.UserID > WHERE ItemID = Distribution.ItemID AND Grade = Distribution.Grade AND > WeightGroupID = Distribution.WeightGroupID > ) > > Distribution takes the form: > CREATE TABLE Distribution ( > Grade int NOT NULL, > ItemID numeric(18, 0) NOT NULL, > WeightGroupID numeric(18, 0) NOT NULL, > Value float NOT NULL, > PRIMARY KEY (Grade, ItemID, WeightGroupID) > ); > > I am unsure if I need to post the DDL descriptions of the other tables; feel > free to ask. > > Anyway, there are two seemingly distinct problems with this: > > *1 > As you can see, this UPDATE touches the entire table upon each execution, > which in postgresql seems to mean that performance quickly deteriorates > unless you are constantly vacuuming. To answer an obvious question, I *do* > feel that this is a necessary thing to do - if I were to update only those > rows that would acquire a new value, it would most probably turn out to be > the majority of them as the system is expected to receive a steady flow of > input data when launched. > This operation also needs to be performed very often - ideally about every > 10th second. I tried leaving the background process running for a few days > without vacuuming during the holiday, and while testing today I found that > the above query takes aproximately 2 minutes to complete - for a > Distribution size of 600 rows! Also, neither of the other two tables > referenced contained more than 30 rows. > > *2 > Even when the involved tables are freshly vacuumed (and vacuum analyzed), > the above query using the data amount from (*1) takes about 4 seconds to > complete. That should be compared to the about 50ms it takes to complete for > the same amount of data on Windows 2000 against SQL Server on a comparable > machine. > > Needless to say, this is rather catastrophical, and I'm pretty much out of > ideas. The performance values given above are those I got while executing > the query directly in psql, with no other connections to the database, so > this does not seem to me to be a problem with the system in itself. A > configuration problem, perhaps? A bug? SQL ignorance on my part? I could > surely use some help. > As previously noted, I am not very comfortable with unix systems, and not > exactly an SQL guru either. > The linux distribution is some version of redhat, and the postgresql version > is 7.1.3. > > Any help or attempt to help would be greatly appreciated. > > / Ztream > >
pgsql-general by date: