huge performance penalty from constraint triggers - Mailing list pgsql-general
From | Ben Liblit |
---|---|
Subject | huge performance penalty from constraint triggers |
Date | |
Msg-id | 3D49E5E9.5070903@eecs.berkeley.edu Whole thread Raw |
Responses |
Re: huge performance penalty from constraint triggers
|
List | pgsql-general |
I have a modest-sized PostgreSQL database, with about four thousand records across three tables. I am seeing an astonishing variance in the time it takes to initially populate the database versus the time it takes to restore it from a standard pg_dump archive. Initial population takes nineteen hours, while restoring the dump takes a mere twenty one seconds! The bulk data is the same in both cases. I am using the COPY command when initially populating the database, as does the pg_dump archive. The key difference appears to be when CONSTRAINT TRIGGERs are created. When I initially populate the database, I create one sequence and three tables. Those tables have various primary keys, foreign keys, references, and uniqueness constraints, from which PostgreSQL immediately induces several indexes and constraint triggers. I then use three COPY commands to populate the tables. Thus the COPY happens *after* the indexes and triggers have been created. The pg_dump archive orders things differently. It first creates the sequence and tables, then has three COPY commands, and lastly creates the indexes and triggers. Restoring from this dump takes twenty one seconds. If I reorder things manually so that the "CREATE UNIQUE INDEX" commands are before the COPYs, restoration time grows slightly, to twenty five seconds. But if I move the "CREATE CONSTRAINT TRIGGER" commands before the COPYs as well ... boom! Nineteen hours. Obviously, putting the triggers after the data means less work, because presumably those triggers will not be run for the bulk data. But twenty one seconds versus nineteen *hours*? That's a slowdown of more than three orders of magnitude! A crude calculation of records inserted per second gives us nineteen thousand records per second without triggers, or a bit less than six per second with triggers. I'm already using COPY instead of many INSERTs. I am aware of the performance tip regarding dropping indexes before uploading large amounts of data, and then regenerating the indexes after. But in this case, index maintenance appears to be a relatively small factor, while the triggers are killing me. Constraint triggers are sparsely documented, and what documentation exists states that they are "not intended for general use". So I'm not sure if dropping and recreating the constraint triggers is advisable. It also defeats the purpose of having them, of course, since they will not be present to maintain referential integrity as new data is added to the system. Attached below please find the SQL commands I use to create the tables (and implicitly, the unique indexes and constraint triggers). We have three tables. In brief, each session has a large collection of numbered sites, and each site has a handful of samples. So each row in the "sessions" table has an ID. Each row in the "sites" table gives a session ID and its own sample number within that session. Each row in the "samples" table cites a session ID and sample number. In my current test, the "sessions" table has one row; the "sites" table has 157,000 rows; and the "samples" table has 244,000 rows. I'd be most grateful if anyone has suggestions for how to improve things. I'm completely baffled. CREATE SEQUENCE session_seq; CREATE TABLE sessions ( session int NOT NULL CHECK (session > 0) PRIMARY KEY DEFAULT NEXTVAL('session_seq'), signal smallint NOT NULL CHECK (signal >= 0) ); CREATE TABLE sites ( session int NOT NULL CHECK (session > 0) REFERENCES sessions MATCH FULL, site bigint NOT NULL CHECK (site > 0), file text NOT NULL CHECK (file <> ''), line int NOT NULL CHECK (line > 0), UNIQUE (session, site) ); CREATE TABLE samples ( session int NOT NULL CHECK (session > 0), site int NOT NULL CHECK (site > 0), expression text NOT NULL CHECK (expression <> ''), type smallint NOT NULL CHECK (type BETWEEN 1 AND 15), value text NOT NULL CHECK (value <> ''), UNIQUE (session, site, expression), FOREIGN KEY (session, site) REFERENCES sites (session, site) MATCH FULL );
pgsql-general by date: