Thread: pgsql: Introduce framework for parallelizing various pg_upgrade tasks.
Introduce framework for parallelizing various pg_upgrade tasks. A number of pg_upgrade steps require connecting to every database in the cluster and running the same query in each one. When there are many databases, these steps are particularly time-consuming, especially since they are performed sequentially, i.e., we connect to a database, run the query, and process the results before moving on to the next database. This commit introduces a new framework that makes it easy to parallelize most of these once-in-each-database tasks by processing multiple databases concurrently. This framework manages a set of slots that follow a simple state machine, and it uses libpq's asynchronous APIs to establish the connections and run the queries. The --jobs option is used to determine the number of slots to use. To use this new task framework, callers simply need to provide the query and a callback function to process its results, and the framework takes care of the rest. A more complete description is provided at the top of the new task.c file. None of the eligible once-in-each-database tasks are converted to use this new framework in this commit. That will be done via several follow-up commits. Reviewed-by: Jeff Davis, Robert Haas, Daniel Gustafsson, Ilya Gladyshev, Corey Huinker Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13 Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/40e2e5e92b7da358fb45802b53c735d25a51d23a Modified Files -------------- doc/src/sgml/ref/pgupgrade.sgml | 6 +- src/bin/pg_upgrade/Makefile | 1 + src/bin/pg_upgrade/meson.build | 1 + src/bin/pg_upgrade/pg_upgrade.h | 21 ++ src/bin/pg_upgrade/task.c | 443 +++++++++++++++++++++++++++++++++++++++ src/tools/pgindent/typedefs.list | 5 + 6 files changed, 474 insertions(+), 3 deletions(-)
Re: pgsql: Introduce framework for parallelizing various pg_upgrade tasks.
From
Alexander Korotkov
Date:
Hi! On Tue, Sep 17, 2024 at 12:11 AM Nathan Bossart <nathan@postgresql.org> wrote: > Introduce framework for parallelizing various pg_upgrade tasks. > > A number of pg_upgrade steps require connecting to every database > in the cluster and running the same query in each one. When there > are many databases, these steps are particularly time-consuming, > especially since they are performed sequentially, i.e., we connect > to a database, run the query, and process the results before moving > on to the next database. > > This commit introduces a new framework that makes it easy to > parallelize most of these once-in-each-database tasks by processing > multiple databases concurrently. This framework manages a set of > slots that follow a simple state machine, and it uses libpq's > asynchronous APIs to establish the connections and run the queries. > The --jobs option is used to determine the number of slots to use. > To use this new task framework, callers simply need to provide the > query and a callback function to process its results, and the > framework takes care of the rest. A more complete description is > provided at the top of the new task.c file. > > None of the eligible once-in-each-database tasks are converted to > use this new framework in this commit. That will be done via > several follow-up commits. > > Reviewed-by: Jeff Davis, Robert Haas, Daniel Gustafsson, Ilya Gladyshev, Corey Huinker > Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13 Should we add UpgradeTaskProcessCB to the typedefs.list? I don't see this would directly influence indentation right now, but probably we should do for uniformity? ------ Regards, Alexander Korotkov Supabase