Re: Help with large delete - Mailing list pgsql-general
From | Perry Smith |
---|---|
Subject | Re: Help with large delete |
Date | |
Msg-id | 57F41293-F307-4579-9459-79A0787E4644@easesoftware.com Whole thread Raw |
In response to | Re: Help with large delete (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Help with large delete
|
List | pgsql-general |
I’m sending this again. I don’t see that it made it to the list but there is also new info here.
On Apr 16, 2022, at 10:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:Perry Smith <pedz@easesoftware.com> writes:Currently I have one table that mimics a file system. Each entry has a parent_id and a base name where parent_id is an id in the table that must exist in the table or be null with cascade on delete.
I’ve started a delete of a root entry with about 300,000 descendants. The table currently has about 22M entries and I’m adding about 1600 entries per minute still. Eventually there will not be massive amounts of entries being added and the table will be mostly static.
The most obvious question is do you have an index on the referencing
column. PG doesn't require one to exist to create an FK; but if you
don't, deletes of referenced rows had better be uninteresting to you
performance-wise, because each one will cause a seqscan.
To try to reply to Peter’s question, I jstarted:
psql -c "explain analyze delete from dateien where basename = '/mnt/pedz/Visual_Media'” find_dups
I did this last night at 10 p.m. and killed it just now at 6:30 without any response.
This is inside a BSD “jail” on a NAS. I don’t know how much CPU the jail is given.
For Tom’s question, here is the description of the table:
psql -c '\d dateien' find_dups
Table "public.dateien"
Column | Type | Collation | Nullable | Default
------------+--------------------------------+-----------+----------+-------------------------------------
id | bigint | | not null | nextval('dateien_id_seq'::regclass)
basename | character varying | | not null |
parent_id | bigint | | |
dev | bigint | | not null |
ftype | character varying | | not null |
uid | bigint | | not null |
gid | bigint | | not null |
ino | bigint | | not null |
mode | bigint | | not null |
mtime | timestamp without time zone | | not null |
nlink | bigint | | not null |
size | bigint | | not null |
sha1 | character varying | | |
created_at | timestamp(6) without time zone | | not null |
updated_at | timestamp(6) without time zone | | not null |
Indexes:
"dateien_pkey" PRIMARY KEY, btree (id)
"unique_dev_ino_for_dirs" UNIQUE, btree (dev, ino) WHERE ftype::text = 'directory'::text
"unique_parent_basename" UNIQUE, btree (COALESCE(parent_id, '-1'::integer::bigint), basename)
Foreign-key constraints:
"fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) REFERENCES dateien(id) ON DELETE CASCADE
Referenced by:
TABLE "dateien" CONSTRAINT "fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) REFERENCES dateien(id) ON DELETE CASCADE
To do a simple delete of a node that has no children takes about 11 seconds:
Table "public.dateien"
Column | Type | Collation | Nullable | Default
------------+--------------------------------+-----------+----------+-------------------------------------
id | bigint | | not null | nextval('dateien_id_seq'::regclass)
basename | character varying | | not null |
parent_id | bigint | | |
dev | bigint | | not null |
ftype | character varying | | not null |
uid | bigint | | not null |
gid | bigint | | not null |
ino | bigint | | not null |
mode | bigint | | not null |
mtime | timestamp without time zone | | not null |
nlink | bigint | | not null |
size | bigint | | not null |
sha1 | character varying | | |
created_at | timestamp(6) without time zone | | not null |
updated_at | timestamp(6) without time zone | | not null |
Indexes:
"dateien_pkey" PRIMARY KEY, btree (id)
"unique_dev_ino_for_dirs" UNIQUE, btree (dev, ino) WHERE ftype::text = 'directory'::text
"unique_parent_basename" UNIQUE, btree (COALESCE(parent_id, '-1'::integer::bigint), basename)
Foreign-key constraints:
"fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) REFERENCES dateien(id) ON DELETE CASCADE
Referenced by:
TABLE "dateien" CONSTRAINT "fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) REFERENCES dateien(id) ON DELETE CASCADE
To do a simple delete of a node that has no children takes about 11 seconds:
time psql -c "delete from dateien where id = 13498939;" find_dups
DELETE 1
psql -c "delete from dateien where id = 13498939;" find_dups 0.00s user 0.01s system 0% cpu 11.282 total
DELETE 1
psql -c "delete from dateien where id = 13498939;" find_dups 0.00s user 0.01s system 0% cpu 11.282 total
I’m implementing the suggestion that I do the recession myself but at this rate it will take about 38 days to delete 300K entries. I must be doing something horribly wrong. I hope you guys can enlighten me.
Thank you for your time,
Perry
Attachment
pgsql-general by date: