Thread: pgsql: Introduce framework for parallelizing various pg_upgrade tasks.

pgsql: Introduce framework for parallelizing various pg_upgrade tasks.

From
Nathan Bossart
Date:
Introduce framework for parallelizing various pg_upgrade tasks.

A number of pg_upgrade steps require connecting to every database
in the cluster and running the same query in each one.  When there
are many databases, these steps are particularly time-consuming,
especially since they are performed sequentially, i.e., we connect
to a database, run the query, and process the results before moving
on to the next database.

This commit introduces a new framework that makes it easy to
parallelize most of these once-in-each-database tasks by processing
multiple databases concurrently.  This framework manages a set of
slots that follow a simple state machine, and it uses libpq's
asynchronous APIs to establish the connections and run the queries.
The --jobs option is used to determine the number of slots to use.
To use this new task framework, callers simply need to provide the
query and a callback function to process its results, and the
framework takes care of the rest.  A more complete description is
provided at the top of the new task.c file.

None of the eligible once-in-each-database tasks are converted to
use this new framework in this commit.  That will be done via
several follow-up commits.

Reviewed-by: Jeff Davis, Robert Haas, Daniel Gustafsson, Ilya Gladyshev, Corey Huinker
Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/40e2e5e92b7da358fb45802b53c735d25a51d23a

Modified Files
--------------
doc/src/sgml/ref/pgupgrade.sgml  |   6 +-
src/bin/pg_upgrade/Makefile      |   1 +
src/bin/pg_upgrade/meson.build   |   1 +
src/bin/pg_upgrade/pg_upgrade.h  |  21 ++
src/bin/pg_upgrade/task.c        | 443 +++++++++++++++++++++++++++++++++++++++
src/tools/pgindent/typedefs.list |   5 +
6 files changed, 474 insertions(+), 3 deletions(-)


Re: pgsql: Introduce framework for parallelizing various pg_upgrade tasks.

From
Alexander Korotkov
Date:
Hi!

On Tue, Sep 17, 2024 at 12:11 AM Nathan Bossart <nathan@postgresql.org> wrote:
> Introduce framework for parallelizing various pg_upgrade tasks.
>
> A number of pg_upgrade steps require connecting to every database
> in the cluster and running the same query in each one.  When there
> are many databases, these steps are particularly time-consuming,
> especially since they are performed sequentially, i.e., we connect
> to a database, run the query, and process the results before moving
> on to the next database.
>
> This commit introduces a new framework that makes it easy to
> parallelize most of these once-in-each-database tasks by processing
> multiple databases concurrently.  This framework manages a set of
> slots that follow a simple state machine, and it uses libpq's
> asynchronous APIs to establish the connections and run the queries.
> The --jobs option is used to determine the number of slots to use.
> To use this new task framework, callers simply need to provide the
> query and a callback function to process its results, and the
> framework takes care of the rest.  A more complete description is
> provided at the top of the new task.c file.
>
> None of the eligible once-in-each-database tasks are converted to
> use this new framework in this commit.  That will be done via
> several follow-up commits.
>
> Reviewed-by: Jeff Davis, Robert Haas, Daniel Gustafsson, Ilya Gladyshev, Corey Huinker
> Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13

Should we add UpgradeTaskProcessCB to the typedefs.list?  I don't see
this would directly influence indentation right now, but probably we
should do for uniformity?

------
Regards,
Alexander Korotkov
Supabase