Thread: time taking deletion on large tables
Hi, The feed_posts table has over 50 Million rows. When I m deleting all rows of a certain type that are over 60 days old. When I try to do a delete like this: it hangs for an entire day, so I need to kill it with pg_terminate_backend(pid). DELETE FROM feed_posts WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae' AND created_at > '2020-05-11 00:00:00' AND created_at < '2020-05-12 00:00:00'; So– I need help in figuring out how to do large deletes on a production database during normal hours. explain plan is given below "Delete on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)" " -> Bitmap Heap Scan on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)" " Recheck Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without time zone))" " Filter: (feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'::uuid)" " -> Bitmap Index Scan on feed_posts_created_at (cost=0.00..1152.68 rows=54812 width=0)" " Index Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without time zone))" please help me on deleting the rows, Do I need to anything in postgres configuration ? or in table structure ? Regards, Atul
Hi Atul,
Please try the code below. Execute all the statements in one transaction.
select * into new_table from old_table where type = 'abcz';
truncate table old_table;
inesrt into old_table select * from new_table;
On Thu, Dec 3, 2020 at 8:16 PM Atul Kumar <akumar14871@gmail.com> wrote:
Hi,
The feed_posts table has over 50 Million rows.
When I m deleting all rows of a certain type that are over 60 days old.
When I try to do a delete like this: it hangs for an entire day, so I
need to kill it with pg_terminate_backend(pid).
DELETE FROM feed_posts
WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
AND created_at > '2020-05-11 00:00:00'
AND created_at < '2020-05-12 00:00:00';
So– I need help in figuring out how to do large deletes on a
production database during normal hours.
explain plan is given below
"Delete on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)"
" -> Bitmap Heap Scan on feed_posts (cost=1156.57..195748.88
rows=15534 width=6)"
" Recheck Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp
without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp
without time zone))"
" Filter: (feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'::uuid)"
" -> Bitmap Index Scan on feed_posts_created_at (cost=0.00..1152.68
rows=54812 width=0)"
" Index Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without
time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without
time zone))"
please help me on deleting the rows, Do I need to anything in postgres
configuration ?
or in table structure ?
Regards,
Atul
Regards,
Ravikumar S,
Ph: 8106741263
On Thu, Dec 03, 2020 at 08:43:57PM +0530, Ravikumar Reddy wrote: > Please try the code below. Execute all the statements in one transaction. > > select * into new_table from old_table where type = 'abcz'; > truncate table old_table; > inesrt into old_table select * from new_table; This looks like advice for when most of the rows are being deleted, but I don't think that's true here. It'd need to LOCK old_table, first, right? Also, truncate isn't MVCC safe. Atul: What server version? Do you have an index on feed_definition_id ? https://wiki.postgresql.org/wiki/Slow_Query_Questions If explain (analyze,buffers) SELECT runs in a reasonable time for that query, include its output. On Thu, Dec 3, 2020 at 8:16 PM Atul Kumar <akumar14871@gmail.com> wrote: > The feed_posts table has over 50 Million rows. > > When I m deleting all rows of a certain type that are over 60 days old. > > When I try to do a delete like this: it hangs for an entire day, so I > need to kill it with pg_terminate_backend(pid). > > DELETE FROM feed_posts > WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae' > AND created_at > '2020-05-11 00:00:00' > AND created_at < '2020-05-12 00:00:00'; > > So– I need help in figuring out how to do large deletes on a > production database during normal hours. > > please help me on deleting the rows, Do I need to anything in postgres > configuration ? > or in table structure ?
Justin Pryzby <pryzby@telsasoft.com> writes: > On Thu, Dec 03, 2020 at 08:43:57PM +0530, Ravikumar Reddy wrote: >> When I try to do a delete like this: it hangs for an entire day, so I >> need to kill it with pg_terminate_backend(pid). >> >> DELETE FROM feed_posts >> WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae' >> AND created_at > '2020-05-11 00:00:00' >> AND created_at < '2020-05-12 00:00:00'; 90% of the "delete takes forever" complaints that we hear trace down to having a foreign key reference to the deletion-target table that's not backed by an index on the referencing column. Then you end up getting a seqscan on the referencing table to look for rows referencing a row-to-be-deleted. And then another one for the next row. Etc. You could try "explain analyze" on a query deleting just a single one of these rows and see if an RI enforcement trigger is what's eating the time. regards, tom lane
On 12/3/20 11:16 AM, Tom Lane wrote: > Justin Pryzby <pryzby@telsasoft.com> writes: >> On Thu, Dec 03, 2020 at 08:43:57PM +0530, Ravikumar Reddy wrote: >>> When I try to do a delete like this: it hangs for an entire day, so I >>> need to kill it with pg_terminate_backend(pid). >>> >>> DELETE FROM feed_posts >>> WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae' >>> AND created_at > '2020-05-11 00:00:00' >>> AND created_at < '2020-05-12 00:00:00'; > 90% of the "delete takes forever" complaints that we hear trace down to > having a foreign key reference to the deletion-target table that's not > backed by an index on the referencing column. Then you end up getting > a seqscan on the referencing table to look for rows referencing a > row-to-be-deleted. And then another one for the next row. Etc. > > You could try "explain analyze" on a query deleting just a single > one of these rows and see if an RI enforcement trigger is what's > eating the time. > > Yeah. IIRC some other RDBMS systems actually create such an index if it doesn't already exist. Maybe we should have a warning when setting up an FK constraint if the referencing fields aren't usefully indexed. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com