Thread: Delete performance

Delete performance

From

adey

Date:

21 February 2006, 03:02:56

Please give me some guidance?

We are attempting many deletes in our production database for the first time, and we're getting nowhere fast.

The SQL runs for more than 12 hours to delete 2 million rows, and hasn't finished each time we've tried it as we've had to cancel it.

I have tried running queries for locks, current activity, and buffer hits. I can see row locks on the affected tables for the delete PID, but no significant buffer hits or changes in row numbers while it is running. We have fsync set to default (true) with default 8 buffers. Postgres 7.4.2 is running on Debian on a 4 processor server with 4gb RAM. TOP shows cache increasing slowly, and postmaster using at least 1 CPU 100%. pg_clog files swap about every 4 hours. We Vacuum (no parms) and ANALYZE daily, but no VACUUM FULL for months. Delete is being performed on a parent table of 11 million rows, related to 5 child tables by foreign keys with ON DELETE CASCADE. We have followed previous advice in this forum and tweaked / increased the "famous" performance parameters in v7 such as effective_cache_size, vacuum_mem and buffer size with associated SHMMAX increase.

Where to next please?

Re: Delete performance

From

Tom Lane

Date:

21 February 2006, 11:01:27

adey <adey11@gmail.com> writes:
> We are attempting many deletes in our production database for the first
> time, and we're getting nowhere fast.
> The SQL runs for more than 12 hours to delete 2 million rows, and hasn't
> finished each time we've tried it as we've had to cancel it.

The usual cause of slow deletes is that (a) the table is the target of
some foreign key references from other large tables, and (b) the
referencing columns in those tables aren't indexed, or (in older PG
versions such as 7.4) aren't exactly the same datatype as the master
column.  This forces the FK actions to use inefficient sequential-scan
plans.  Fix the index situation and then start a fresh session to ensure
you have fresh FK-action plans.

Please also think *hard* about running something more modern than 7.4.2.
That release series is at 7.4.12 --- you are missing nearly two years'
worth of critical bug fixes.

            regards, tom lane

Re: Delete performance

From

Arnau

Date:

22 February 2006, 05:09:53

Hi all,

> The usual cause of slow deletes is that (a) the table is the target of
> some foreign key references from other large tables, and (b) the
> referencing columns in those tables aren't indexed.

   This is a thing I don't understand, as far as I know the foreign keys
references to primary keys and postgresql creates itself and index over
the primary key, so those columns always should be indexed. Taking into
account Tom's observation I'm missing something, could you explain it to
all of us :)

Thanks
--
Arnau

Re: Delete performance

From

Tom Lane

Date:

22 February 2006, 10:31:42

Arnau <arnaulist@andromeiberica.com> writes:
>> The usual cause of slow deletes is that (a) the table is the target of
>> some foreign key references from other large tables, and (b) the
>> referencing columns in those tables aren't indexed.

>    This is a thing I don't understand, as far as I know the foreign keys
> references to primary keys and postgresql creates itself and index over
> the primary key, so those columns always should be indexed. Taking into
> account Tom's observation I'm missing something, could you explain it to
> all of us :)

The referencED column is forced to have an index.  The referencING
column is not.  The cases where you need an index on the latter are
precisely updates/deletes of the referencED column.

In the old version you are using you can also get burnt by datatype
mismatches --- the foreign key mechanism will allow that as long as
it can find an equality operator for the two types, but that equality
operator might not be indexable.

            regards, tom lane

Re: Delete performance

From

Arnau

Date:

23 February 2006, 06:27:46

Hi all,

   Maybe the direction this thread has taken is a bit out of the scope
of this mailing list, but I think it's very interesting and can be
useful for newbie users.

>
>>>The usual cause of slow deletes is that (a) the table is the target of
>>>some foreign key references from other large tables, and (b) the
>>>referencing columns in those tables aren't indexed.
>
>
>>   This is a thing I don't understand, as far as I know the foreign keys
>>references to primary keys and postgresql creates itself and index over
>>the primary key, so those columns always should be indexed. Taking into
>>account Tom's observation I'm missing something, could you explain it to
>>all of us :)
>
>
> The referencED column is forced to have an index.  The referencING
> column is not.  The cases where you need an index on the latter are
> precisely updates/deletes of the referencED column.
>
> In the old version you are using you can also get burnt by datatype
> mismatches --- the foreign key mechanism will allow that as long as
> it can find an equality operator for the two types, but that equality
> operator might not be indexable.


   Lets put an example

   CREATE TABLE departments
   (
     id   INT2
          CONSTRAINT pk_dept_id PRIMARY KEY,
     name VARCHAR(50)
          CONSTRAINT nn_dept_name NOT NULL
   );

   CREATE TABLE users
   (
     id            INT8
                   CONSTRAINT pk_users_id PRIMARY KEY,
     name          VARCHAR(50)
                   CONSTRAINT nn_users_name NOT NULL,
     department_id INT2
                   CONSTRAINT fk_users_deptid REFERENCES departments(id)
                   CONSTRAINT nn_users_deptid NOT NULL
   )

   Do we should create the following index?

   CREATE INDEX idx_users_deptid ON users(department_id)

   Could we say as rule of thumb the following: "Create an index for
each table's foreign key"?

Regards
--
Arnau

Re: Delete performance

From

Tom Lane

Date:

23 February 2006, 12:35:11

Arnau <arnaulist@andromeiberica.com> writes:
>> The referencED column is forced to have an index.  The referencING
>> column is not.  The cases where you need an index on the latter are
>> precisely updates/deletes of the referencED column.

>    Lets put an example

>    CREATE TABLE departments
>    (
>      id   INT2
>           CONSTRAINT pk_dept_id PRIMARY KEY,
>      name VARCHAR(50)
>           CONSTRAINT nn_dept_name NOT NULL
>    );

>    CREATE TABLE users
>    (
>      id            INT8
>                    CONSTRAINT pk_users_id PRIMARY KEY,
>      name          VARCHAR(50)
>                    CONSTRAINT nn_users_name NOT NULL,
>      department_id INT2
>                    CONSTRAINT fk_users_deptid REFERENCES departments(id)
>                    CONSTRAINT nn_users_deptid NOT NULL
>    )

>    Do we should create the following index?

>    CREATE INDEX idx_users_deptid ON users(department_id)

Yes, if you are concerned about the performance of updates/deletes on
the departments table.  The reason the system doesn't make such an index
automatically is that there are common scenarios where you seldom or
never update the master table, and so the index wouldn't repay the cost
it creates for updates of the slave table.

            regards, tom lane

Re: Delete performance

From

Arnau

Date:

23 February 2006, 12:45:05

> Yes, if you are concerned about the performance of updates/deletes on
> the departments table.  The reason the system doesn't make such an index
> automatically is that there are common scenarios where you seldom or
> never update the master table, and so the index wouldn't repay the cost
> it creates for updates of the slave table.

Thanks for the explanations :-)

regards
--
Arnau