Performance problem with row count trigger - Mailing list pgsql-sql
From | Tony Cebzanov |
---|---|
Subject | Performance problem with row count trigger |
Date | |
Msg-id | 49D4CA3A.3050005@andrew.cmu.edu Whole thread Raw |
Responses |
Re: Performance problem with row count trigger
Re: Performance problem with row count trigger Re: Performance problem with row count trigger |
List | pgsql-sql |
I was looking to speed up a count(*) query, as per the recommendations on the postgres wiki: http://wiki.postgresql.org/wiki/Slow_Counting I decided to use the trigger approach to get an accurate count that doesn't depend on VACUUM being run recently. I've got it working, but the addition of the trigger slows things down so bad that it's not a viable option. I was hoping for advice on how to speed things up, or at least an explanation of why it gets so slow. The relevant tables are as follows: --------------------------------------------------------------------------- CREATE TABLE dataset( dataset_id SERIAL PRIMARY KEY, catalog_id INTEGER REFERENCES catalog (catalog_id) ON DELETE CASCADE, t_begin TIMESTAMP WITHOUT TIME ZONE NULL, t_end TIMESTAMP WITHOUT TIME ZONE NULL, "ctime" TIMESTAMP WITHOUTTIME ZONE NOT NULL DEFAULT now(), "mtime" TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT now(), "assoc_count" BIGINTNOT NULL DEFAULT 0 ) CREATE TABLE assoc ( dataset_id INTEGER REFERENCES dataset (dataset_id) ON DELETE CASCADE, range ip4r NOT NULL, label_idINTEGER NULL, value BIGINT NULL, PRIMARY KEY (dataset_id, range), UNIQUE (dataset_id, range, label_id) ); --------------------------------------------------------------------------- What I want to do is update the assoc_count field in the dataset table to reflect the count of related records in the assoc field. To do so, I added the following trigger: --------------------------------------------------------------------------- CREATE OR REPLACE FUNCTION update_assoc_count_insert() RETURNS TRIGGER AS ' BEGIN UPDATE dataset SET assoc_count = assoc_count + 1 WHERE dataset_id = NEW.dataset_id; RETURN NEW; END ' LANGUAGE plpgsql; CREATE TRIGGER assoc_update_assoc_count_insert AFTER INSERT ON assoc FOR EACH ROW EXECUTE PROCEDURE update_assoc_count_insert(); --------------------------------------------------------------------------- (I also have triggers for UPDATE/DELETE, left out for brevity.) The slowness I'm talking about doesn't show up for a single insert, but arises when doing thousands of them in a transaction. Here are some test runs of 10,000 inserts without the trigger in place: --------------------------------------------------------------------------- 1000 (2231.540142/s) 2000 (2341.849077/s) 3000 (2234.332303/s) 4000 (2311.247629/s) 5000 (2366.171695/s) 6000 (2400.028800/s) 7000 (2407.147716/s) 8000 (2416.419084/s) 9000 (2401.476107/s) 10000 (2406.870943/s) --------------------------------------------------------------------------- The number in parens is the number of inserts per second for each batch of 1,000 inserts. As you can see, performance isn't too terrible, and is pretty constant from start to finish. Now I add the trigger, and here's what happens: --------------------------------------------------------------------------- 1000 (1723.216901/s) 2000 (1613.529119/s) 3000 (1526.081496/s) 4000 (1431.907261/s) 5000 (1340.159570/s) 6000 (1269.746140/s) 7000 (1191.374990/s) 8000 (1117.332012/s) 9000 (1056.309389/s) 10000 (1001.051003/s) --------------------------------------------------------------------------- The throughput of the first batch of 1,000 is diminished, but still tolerable, but after 10,000 inserts, it's gotten much worse. This pattern continues, to the point where performance is unacceptable after 20k or 30k inserts. To rule out the performance of the trigger mechanism itself, I swapped the trigger out for one that does nothing. The results were the same as without the trigger (the first set of numbers), which leads me to believe there's something about the UPDATE statement in the trigger that is causing this behavior. I then tried setting the assoc_count to a constant number instead of trying to increment it with assoc_count = assoc_count + 1, but performance was just as bad as with the proper UPDATE statement. I'm not really sure where to go next. I can update the assoc_count once at the end of the transaction (without a trigger), but this could lead to inconsistencies if another client does inserts without updating the count. I would really prefer to use the trigger solution recommended on the PGsql wiki, but can't do so until I solve this performance problem. I greatly appreciate any and all help. Thanks. -Tony