Home > mailing lists

Re: best way to do bulk delete? - Mailing list pgsql-novice

From	pg noob
Subject	Re: best way to do bulk delete?
Date	October 28, 2011 13:56:55
Msg-id	CAPNY-2VskCouya2aVOwQmT2BRhyd2VENe7ofz0Zt38Jjhnq2hg@mail.gmail.com Whole thread
In response to	Re: best way to do bulk delete? (Merlin Moncure <mmoncure@gmail.com>)
Responses	Re: best way to do bulk delete?
List	pgsql-novice

Tree view

Thank you. That approach looks promising.
When you say that it is not side effect free, are there any other side effects other than the composite types one you mentioned?
I don't believe these tables contain any composite types so it shouldn't be an issue.

I tested it and found that there is no ALTER TABLE <foo> RENAME to <bar> CASCADE.
It gives a syntax error on the CASCADE keyword. I am using postgres 8.4, maybe cascade is supported here in a newer version.
But I am able to work around that by renaming the table indexes and constraints individually.

Where this approach seems to break down is when the table has a lot of dependent objects (dependent constraints, functions, views, triggers, etc.).
I could use DROP CASCADE but then I would also need to save and recreate all those dependent objects as well.

But for simple tables that don't have a lot of dependent objects your suggested approach seems like it can work well.

On Fri, Oct 28, 2011 at 11:19 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Fri, Oct 28, 2011 at 9:30 AM, pg noob <pgnube@gmail.com> wrote:
>
> Greetings,
>
> I have an occasional recurring use case where I need to delete a large
> number of rows from multiple tables as quickly as possible,
> something like a bulk-delete.
>
> A typical example would be deleting a few hundred thousand rows at once from
> a set of tables each containing 1 to 2 million rows,
> but in a worst case scenario it could be as large as 1 million rows from a
> set of tables each containing 4 to 5 million rows.
>
> These tables of course have indexes on them as well as foreign key
> constraints and cascade deletes to other tables.
>
> I can't simply truncate the tables because the rows being deleted are
> subsets of the total amount of data in the tables.
>
> These tables have heavy insert/update/delete activity going on at the same
> time but mostly not on the same set of rows that
> is being bulk-deleted (though there may be some update activity going on
> which accesses those rows and hasn't yet quiesced).
>
> What is the best way of going about this?
>
> I've considered a few options.
>
> One option is to open a single transaction, issue delete statements that
> delete huge numbers of rows from each
> table in question (probably with a DELETE USING SQL query) and then commit
> the transaction.
> My concern with this approach is that it will hold a huge number of row
> locks while in progress and may take a long
> time to complete and could introduce deadlocks if it competes with other
> updates that have acquired row locks out of order.
>
> Another option would be to delete one row per transaction or a smaller set
> of rows as part of a transaction and then
> commit the transaction, repeating in a loop.
> This has the advantage that if the transaction hits an error and has to be
> rolled back it doesn't have to redo the entire
> delete operation again, and it doesn't hold as many row locks for as long a
> time. The drawback is that I believe this approach
> would be a lot slower.
>
> And finally, I've considered the idea of using COPY to copy the data that
> needs to be kept to temporary tables,
> truncating the original tables and then copying the data back. I believe
> this would be the most efficient way to
> do the delete but the implementation is quite a bit more complicated than
> the first two options I described,
> and has implications for how to deal with error scenarios or database/system
> crashes while the operation is in progress.

you don't need to COPY -- just insert/select, like this:

CREATE TABLE keeprows(LIKE old_table INCLUDING INDEXES INCLUDING CONSTRAINTS);
INSERT INTO keeprows SELECT * FROM old_table WHERE ...
DROP TABLE old_table;
ALTER TABLE keeprows RENAME to old_table CASCADE;
<reset RI rules>

This procedure is not side effect free and has some caveats: for
example, if you have code that is dependent on the table's composite
type that will also drop and has to be reconstructed.

Perhaps a better way to go if you can structure your code around it is
to simply partition your table around when it gets loaded and dropped
-- then your deletion becomes 'DROP TABLE' with no extra processing.

merlin

pgsql-novice by date:

From: Detox
Date: 28 October 2011, 13:49:14
Subject: PgAdmin III shows removed PostgreSQL

From: Merlin Moncure
Date: 28 October 2011, 14:01:50
Subject: Re: best way to do bulk delete?

Re: best way to do bulk delete? - Mailing list pgsql-novice

Previous

Next