Thread: Delete performance
Please give me some guidance?
We are attempting many deletes in our production database for the first time, and we're getting nowhere fast.
The SQL runs for more than 12 hours to delete 2 million rows, and hasn't finished each time we've tried it as we've had to cancel it.
I have tried running queries for locks, current activity, and buffer hits. I can see row locks on the affected tables for the delete PID, but no significant buffer hits or changes in row numbers while it is running. We have fsync set to default (true) with default 8 buffers. Postgres 7.4.2 is running on Debian on a 4 processor server with 4gb RAM. TOP shows cache increasing slowly, and postmaster using at least 1 CPU 100%. pg_clog files swap about every 4 hours. We Vacuum (no parms) and ANALYZE daily, but no VACUUM FULL for months. Delete is being performed on a parent table of 11 million rows, related to 5 child tables by foreign keys with ON DELETE CASCADE. We have followed previous advice in this forum and tweaked / increased the "famous" performance parameters in v7 such as effective_cache_size, vacuum_mem and buffer size with associated SHMMAX increase.
Where to next please?
adey <adey11@gmail.com> writes: > We are attempting many deletes in our production database for the first > time, and we're getting nowhere fast. > The SQL runs for more than 12 hours to delete 2 million rows, and hasn't > finished each time we've tried it as we've had to cancel it. The usual cause of slow deletes is that (a) the table is the target of some foreign key references from other large tables, and (b) the referencing columns in those tables aren't indexed, or (in older PG versions such as 7.4) aren't exactly the same datatype as the master column. This forces the FK actions to use inefficient sequential-scan plans. Fix the index situation and then start a fresh session to ensure you have fresh FK-action plans. Please also think *hard* about running something more modern than 7.4.2. That release series is at 7.4.12 --- you are missing nearly two years' worth of critical bug fixes. regards, tom lane
Hi all, > The usual cause of slow deletes is that (a) the table is the target of > some foreign key references from other large tables, and (b) the > referencing columns in those tables aren't indexed. This is a thing I don't understand, as far as I know the foreign keys references to primary keys and postgresql creates itself and index over the primary key, so those columns always should be indexed. Taking into account Tom's observation I'm missing something, could you explain it to all of us :) Thanks -- Arnau
Arnau <arnaulist@andromeiberica.com> writes: >> The usual cause of slow deletes is that (a) the table is the target of >> some foreign key references from other large tables, and (b) the >> referencing columns in those tables aren't indexed. > This is a thing I don't understand, as far as I know the foreign keys > references to primary keys and postgresql creates itself and index over > the primary key, so those columns always should be indexed. Taking into > account Tom's observation I'm missing something, could you explain it to > all of us :) The referencED column is forced to have an index. The referencING column is not. The cases where you need an index on the latter are precisely updates/deletes of the referencED column. In the old version you are using you can also get burnt by datatype mismatches --- the foreign key mechanism will allow that as long as it can find an equality operator for the two types, but that equality operator might not be indexable. regards, tom lane
Hi all, Maybe the direction this thread has taken is a bit out of the scope of this mailing list, but I think it's very interesting and can be useful for newbie users. > >>>The usual cause of slow deletes is that (a) the table is the target of >>>some foreign key references from other large tables, and (b) the >>>referencing columns in those tables aren't indexed. > > >> This is a thing I don't understand, as far as I know the foreign keys >>references to primary keys and postgresql creates itself and index over >>the primary key, so those columns always should be indexed. Taking into >>account Tom's observation I'm missing something, could you explain it to >>all of us :) > > > The referencED column is forced to have an index. The referencING > column is not. The cases where you need an index on the latter are > precisely updates/deletes of the referencED column. > > In the old version you are using you can also get burnt by datatype > mismatches --- the foreign key mechanism will allow that as long as > it can find an equality operator for the two types, but that equality > operator might not be indexable. Lets put an example CREATE TABLE departments ( id INT2 CONSTRAINT pk_dept_id PRIMARY KEY, name VARCHAR(50) CONSTRAINT nn_dept_name NOT NULL ); CREATE TABLE users ( id INT8 CONSTRAINT pk_users_id PRIMARY KEY, name VARCHAR(50) CONSTRAINT nn_users_name NOT NULL, department_id INT2 CONSTRAINT fk_users_deptid REFERENCES departments(id) CONSTRAINT nn_users_deptid NOT NULL ) Do we should create the following index? CREATE INDEX idx_users_deptid ON users(department_id) Could we say as rule of thumb the following: "Create an index for each table's foreign key"? Regards -- Arnau
Arnau <arnaulist@andromeiberica.com> writes: >> The referencED column is forced to have an index. The referencING >> column is not. The cases where you need an index on the latter are >> precisely updates/deletes of the referencED column. > Lets put an example > CREATE TABLE departments > ( > id INT2 > CONSTRAINT pk_dept_id PRIMARY KEY, > name VARCHAR(50) > CONSTRAINT nn_dept_name NOT NULL > ); > CREATE TABLE users > ( > id INT8 > CONSTRAINT pk_users_id PRIMARY KEY, > name VARCHAR(50) > CONSTRAINT nn_users_name NOT NULL, > department_id INT2 > CONSTRAINT fk_users_deptid REFERENCES departments(id) > CONSTRAINT nn_users_deptid NOT NULL > ) > Do we should create the following index? > CREATE INDEX idx_users_deptid ON users(department_id) Yes, if you are concerned about the performance of updates/deletes on the departments table. The reason the system doesn't make such an index automatically is that there are common scenarios where you seldom or never update the master table, and so the index wouldn't repay the cost it creates for updates of the slave table. regards, tom lane
> Yes, if you are concerned about the performance of updates/deletes on > the departments table. The reason the system doesn't make such an index > automatically is that there are common scenarios where you seldom or > never update the master table, and so the index wouldn't repay the cost > it creates for updates of the slave table. Thanks for the explanations :-) regards -- Arnau