Thread: Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized
On 6/3/25 17:34, Dimitrios Apostolou wrote: > The backend process for each of the above ALTER TABLE commands, does not > parallelize the foreign key checks for the different partitions. I > know, because in the logs I see gigabytes of temporary files being > written, with the CONTEXT showing queries issued incrementally on > all the different partitions: > > :LOG: temporary file: path "pg_tblspc/16390/PG_17_202406281/ > pgsql_tmp/pgsql_tmp3363462.579", size 1073741824 > :CONTEXT: SQL statement "SELECT fk."columnX" FROM ONLY > "public"."table_partition_214" fk > LEFT OUTER JOIN ONLY "public"."another_table" pk > ON ( pk."columnX" OPERATOR(pg_catalog.=) fk."columnX") > WHERE pk."columnX" IS NULL AND (fk."columnX" IS NOT NULL)" > > Why can't the backend issue these queries in parallel workers? This has been discussed here: https://www.postgresql.org/message-id/flat/0d21e3b4-dcde-290c-875e-6ed5013e8e52%40dalibo.com Perhaps we should exhume this patch, but I believe the optimal strategy is to perform a VACUUM between the data and post-data to build the visibility map. The anti-join can then use an efficient index-only scan. Best regards, Frédéric
Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized
From
Dimitrios Apostolou
Date:
On Wed, 4 Jun 2025, Frédéric Yhuel wrote: > On 6/3/25 17:34, Dimitrios Apostolou wrote: >> The backend process for each of the above ALTER TABLE commands, does not >> parallelize the foreign key checks for the different partitions. I >> know, because in the logs I see gigabytes of temporary files being >> written, with the CONTEXT showing queries issued incrementally on >> all the different partitions: >> >> :LOG: temporary file: path "pg_tblspc/16390/PG_17_202406281/ >> pgsql_tmp/pgsql_tmp3363462.579", size 1073741824 >> :CONTEXT: SQL statement "SELECT fk."columnX" FROM ONLY >> "public"."table_partition_214" fk >> LEFT OUTER JOIN ONLY "public"."another_table" pk >> ON ( pk."columnX" OPERATOR(pg_catalog.=) fk."columnX") >> WHERE pk."columnX" IS NULL AND (fk."columnX" IS NOT NULL)" >> >> Why can't the backend issue these queries in parallel workers? > > This has been discussed here: > https://www.postgresql.org/message-id/flat/0d21e3b4-dcde-290c-875e-6ed5013e8e52%40dalibo.com > > Perhaps we should exhume this patch, but I believe the optimal strategy is to > perform a VACUUM between the data and post-data to build the visibility map. > The anti-join can then use an efficient index-only scan. Thanks for pointing to this patch. Since I run each of the pg_restore sections separately, I will try to manually do a VACUUM after the "data" and before the "post-data" section. In general I have noticed most operations are slower after a succesful pg_restore until VACUUM is complete, which is unfortunate as the database is huge and it takes days to run. Something I have on my list to try, is whether a COPY FREEZE would alleviate all this trouble, since all tuples are immediately visible then. Maybe a patch for a new pg_restore option --freeze is a better solution. Are my assumptions right? Thanks, Dimitris
On 6/4/25 16:12, Dimitrios Apostolou wrote: > In general I have noticed most operations are slower after a succesful > pg_restore until VACUUM is complete, which is unfortunate as the > database is huge and it takes days to run. Something I have on my list > to try, is whether a COPY FREEZE would alleviate all this trouble, since > all tuples are immediately visible then. Maybe a patch for a new > pg_restore option --freeze is a better solution. Are my assumptions right? It seems that the idea has already been discussed: https://www.postgresql.org/message-id/flat/CA%2BU5nM%2BXvkUu9ran%2B5cY%3DTWQquLTpvzte4KVMK%3DaDfbr-xfNXA%40mail.gmail.com#b61a7fee06e10e61afa68712bc0b3c5b I've CCed Bruce Mojman, in the hope that he can tell us more about it.
On 6/5/25 16:13, Frédéric Yhuel wrote: > > > On 6/4/25 16:12, Dimitrios Apostolou wrote: >> In general I have noticed most operations are slower after a succesful >> pg_restore until VACUUM is complete, which is unfortunate as the >> database is huge and it takes days to run. Something I have on my list >> to try, is whether a COPY FREEZE would alleviate all this trouble, >> since all tuples are immediately visible then. Maybe a patch for a new >> pg_restore option --freeze is a better solution. Are my assumptions >> right? > > It seems that the idea has already been discussed: https:// > www.postgresql.org/message-id/flat/ > CA%2BU5nM%2BXvkUu9ran%2B5cY%3DTWQquLTpvzte4KVMK%3DaDfbr- > xfNXA%40mail.gmail.com#b61a7fee06e10e61afa68712bc0b3c5b > > I've CCed Bruce Mojman, in the hope that he can tell us more about it. > > (It might be more interesting now than 12 years ago thanks to this patch: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7db0cd2145f2bce84cac92402e205e4d2b045bf2)
[PATCH] ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized
From
Dimitrios Apostolou
Date:
On Thu, 5 Jun 2025, Frédéric Yhuel wrote: > > On 6/4/25 16:12, Dimitrios Apostolou wrote: >> In general I have noticed most operations are slower after a succesful >> pg_restore until VACUUM is complete, which is unfortunate as the database >> is huge and it takes days to run. Something I have on my list to try, is >> whether a COPY FREEZE would alleviate all this trouble, since all tuples >> are immediately visible then. Maybe a patch for a new pg_restore option >> --freeze is a better solution. Are my assumptions right? > > It seems that the idea has already been discussed: > https://www.postgresql.org/message-id/flat/CA%2BU5nM%2BXvkUu9ran%2B5cY%3DTWQquLTpvzte4KVMK%3DaDfbr-xfNXA%40mail.gmail.com#b61a7fee06e10e61afa68712bc0b3c5b > > I've CCed Bruce Mojman, in the hope that he can tell us more about it. Thanks for all the pointers, it shows that changes in postgres are harder than they appear. FWIW I implemented a pg_restore --freeze patch, see attached. It needs another patch of mine from [1] that implements pg_restore --data-only --clean, which for parallel restores encases each COPY in its own transaction and prepends it with a TRUNCATE. All feedback is welcome. [1] https://www.postgresql.org/message-id/c61263f2-7472-5dd8-703d-01e683421f61%40gmx.net It works really fast for the data, and I see that some, but not all items from section=post-data, start parallel plans. For example I see CREATE INDEX spawns parallel workers. But unfortunately the item in question (ADD FOREIGN KEY) is not parallel (probably because the discussion [2] you posted in your previous email never concluded). I /think/ though it's reading all the data faster than before, but still has to go through terabytes of data and this takes a long time, for each of the foreign keys it adds. [2] https://www.postgresql.org/message-id/flat/0d21e3b4-dcde-290c-875e-6ed5013e8e52@dalibo.com Still I wonder why pg_restore can't issue many ADD FOREIGN KEY for the same table in parallel. Regards, Dimitris
Attachment
Re: [PATCH] ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized
From
Dimitrios Apostolou
Date:
> > FWIW I implemented a pg_restore --freeze patch, see attached. It needs > another patch of mine from [1] that implements pg_restore --data-only > --clean, which for parallel restores encases each COPY in its own transaction > and prepends it with a TRUNCATE. All feedback is welcome. > > [1] https://www.postgresql.org/message-id/c61263f2-7472-5dd8-703d-01e683421f61%40gmx.net > > It works really fast for the data, and I see that some, but not all items > from section=post-data, start parallel plans. For example I see CREATE INDEX > spawns parallel workers. I added it to July's commitfest, mostly to trigger discussion around the issue. https://commitfest.postgresql.org/patch/5826/ Not sure how to mark on the commitfest page that the patch requires another patch from another commitfest entry. Dimitris