Re: best way to do bulk delete? - Mailing list pgsql-novice
From | pg noob |
---|---|
Subject | Re: best way to do bulk delete? |
Date | |
Msg-id | CAPNY-2VskCouya2aVOwQmT2BRhyd2VENe7ofz0Zt38Jjhnq2hg@mail.gmail.com Whole thread Raw |
In response to | Re: best way to do bulk delete? (Merlin Moncure <mmoncure@gmail.com>) |
Responses |
Re: best way to do bulk delete?
|
List | pgsql-novice |
Thank you. That approach looks promising.
When you say that it is not side effect free, are there any other side effects other than the composite types one you mentioned?
I don't believe these tables contain any composite types so it shouldn't be an issue.
I tested it and found that there is no ALTER TABLE <foo> RENAME to <bar> CASCADE.
It gives a syntax error on the CASCADE keyword. I am using postgres 8.4, maybe cascade is supported here in a newer version.
But I am able to work around that by renaming the table indexes and constraints individually.
Where this approach seems to break down is when the table has a lot of dependent objects (dependent constraints, functions, views, triggers, etc.).
I could use DROP CASCADE but then I would also need to save and recreate all those dependent objects as well.
But for simple tables that don't have a lot of dependent objects your suggested approach seems like it can work well.
On Fri, Oct 28, 2011 at 11:19 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
you don't need to COPY -- just insert/select, like this:On Fri, Oct 28, 2011 at 9:30 AM, pg noob <pgnube@gmail.com> wrote:
>
> Greetings,
>
> I have an occasional recurring use case where I need to delete a large
> number of rows from multiple tables as quickly as possible,
> something like a bulk-delete.
>
> A typical example would be deleting a few hundred thousand rows at once from
> a set of tables each containing 1 to 2 million rows,
> but in a worst case scenario it could be as large as 1 million rows from a
> set of tables each containing 4 to 5 million rows.
>
> These tables of course have indexes on them as well as foreign key
> constraints and cascade deletes to other tables.
>
> I can't simply truncate the tables because the rows being deleted are
> subsets of the total amount of data in the tables.
>
> These tables have heavy insert/update/delete activity going on at the same
> time but mostly not on the same set of rows that
> is being bulk-deleted (though there may be some update activity going on
> which accesses those rows and hasn't yet quiesced).
>
> What is the best way of going about this?
>
> I've considered a few options.
>
> One option is to open a single transaction, issue delete statements that
> delete huge numbers of rows from each
> table in question (probably with a DELETE USING SQL query) and then commit
> the transaction.
> My concern with this approach is that it will hold a huge number of row
> locks while in progress and may take a long
> time to complete and could introduce deadlocks if it competes with other
> updates that have acquired row locks out of order.
>
> Another option would be to delete one row per transaction or a smaller set
> of rows as part of a transaction and then
> commit the transaction, repeating in a loop.
> This has the advantage that if the transaction hits an error and has to be
> rolled back it doesn't have to redo the entire
> delete operation again, and it doesn't hold as many row locks for as long a
> time. The drawback is that I believe this approach
> would be a lot slower.
>
> And finally, I've considered the idea of using COPY to copy the data that
> needs to be kept to temporary tables,
> truncating the original tables and then copying the data back. I believe
> this would be the most efficient way to
> do the delete but the implementation is quite a bit more complicated than
> the first two options I described,
> and has implications for how to deal with error scenarios or database/system
> crashes while the operation is in progress.
CREATE TABLE keeprows(LIKE old_table INCLUDING INDEXES INCLUDING CONSTRAINTS);
INSERT INTO keeprows SELECT * FROM old_table WHERE ...
DROP TABLE old_table;
ALTER TABLE keeprows RENAME to old_table CASCADE;
<reset RI rules>
This procedure is not side effect free and has some caveats: for
example, if you have code that is dependent on the table's composite
type that will also drop and has to be reconstructed.
Perhaps a better way to go if you can structure your code around it is
to simply partition your table around when it gets loaded and dropped
-- then your deletion becomes 'DROP TABLE' with no extra processing.
merlin
pgsql-novice by date: