Thread: Conflicting updates of command progress

Conflicting updates of command progress

From

Antonin Houska

Date:

11 April, 17:12:20

While working on [1] I realized that some field of pg_stat_progress_cluste has
weird value. On closer look I found out that cluster_rel() indirectly calls
index_build() and that overwrites the progress parameter. Following are the
parameter values in the master branch:

/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP    1
#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP    2
#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES        3
#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP    4
#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES    5
#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX    6
#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP    7

/* Progress parameters for CREATE INDEX */
/* 3, 4 and 5 reserved for "waitfor" metrics */
#define PROGRESS_CREATEIDX_COMMAND                0
#define PROGRESS_CREATEIDX_INDEX_OID            6
#define PROGRESS_CREATEIDX_ACCESS_METHOD_OID    8
#define PROGRESS_CREATEIDX_PHASE                9    /* AM-agnostic phase # */
#define PROGRESS_CREATEIDX_SUBPHASE                10    /* phase # filled by AM */
#define PROGRESS_CREATEIDX_TUPLES_TOTAL            11
#define PROGRESS_CREATEIDX_TUPLES_DONE            12
#define PROGRESS_CREATEIDX_PARTITIONS_TOTAL        13
#define PROGRESS_CREATEIDX_PARTITIONS_DONE        14
/* 15 and 16 reserved for "block number" metrics */

The only conflicting parameters I see here are
PROGRESS_CLUSTER_PHASE_REBUILD_INDEX vs PROGRESS_CREATEIDX_INDEX_OID (number
6). Fortunately, index_build() does not set PROGRESS_CREATEIDX_INDEX_OID so
there's no live bug. However [1] adds some PROGRESS_CLUSTER_ parameters, thus
making conflicts realistic.

AFAICS the current design does not consider that one progress-reporting
command can be called by another one. Not sure what the correct fix is. We can
either ignore update requests from the "nested" commands, or display the
progress of both. The latter approach is probably more invasive - is that
worth the effort?

[1] https://commitfest.postgresql.org/patch/5117/

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Conflicting updates of command progress

From

Sami Imseih

Date:

14 April, 20:13:00

> While working on [1] I realized that some field of pg_stat_progress_cluste has
> weird value.

Is there a repro that you can share that shows the weird values? It sounds like
the repro is on top of [1]. Is that right?

> AFAICS the current design does not consider that one progress-reporting
> command can be called by another one.

pgstat_progress_start_command should only be called once by the entry
point for the
command. In theory, we could end up in a situation where start_command
is called multiple times during the same top-level command;

> Not sure what the correct fix is. We can
> either ignore update requests from the "nested" commands, or display the

There is a pattern where we do

```
if (progress)
      pgstat_progress_update_param
```

cluster_rel can pass down a flag to index_build or others that update progress
to not report progress. Therefore, only the top level command is
updating progress.
what do you think?


[1] https://commitfest.postgresql.org/patch/5117/

--
Sami Imseih
Amazon Web Services (AWS)

Re: Conflicting updates of command progress

From

Fujii Masao

Date:

15 April, 10:07:07

On 2025/04/15 2:13, Sami Imseih wrote:
>> While working on [1] I realized that some field of pg_stat_progress_cluste has
>> weird value.

I ran into the same issue while working on [2], and eventually had to
withdraw that patch because of it.

> Is there a repro that you can share that shows the weird values? It sounds like
> the repro is on top of [1]. Is that right?

You can reproduce the similar problem by creating a trigger function that
runs a progress-reporting command like COPY, and then COPY data into
a table that uses that trigger.

Regards,

[2] https://commitfest.postgresql.org/patch/5282/

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Conflicting updates of command progress

From

Antonin Houska

Date:

23 April, 13:26:12

Sami Imseih <samimseih@gmail.com> wrote:

> > While working on [1] I realized that some field of pg_stat_progress_cluste has
> > weird value.
>
> Is there a repro that you can share that shows the weird values? It sounds like
> the repro is on top of [1]. Is that right?

Yes.

> > AFAICS the current design does not consider that one progress-reporting
> > command can be called by another one.
>
> pgstat_progress_start_command should only be called once by the entry
> point for the
> command. In theory, we could end up in a situation where start_command
> is called multiple times during the same top-level command;

Not only in theory - it actually happens when CLUSTER is rebuilding indexes.

> > Not sure what the correct fix is. We can
> > either ignore update requests from the "nested" commands, or display the
>
> There is a pattern where we do
>
> ```
> if (progress)
>       pgstat_progress_update_param
> ```
>
> cluster_rel can pass down a flag to index_build or others that update progress
> to not report progress. Therefore, only the top level command is
> updating progress.
> what do you think?

That's a possible approach. However, if the index build takes long time in the
CLUSTER case, the user will probably be interested in details about the index
build.

The other approach is that we report only on the index build, but in that case
the interesting row of pg_stat_progress_cluster would disappear
temporarily. That might also be confusing.

Ideally we should report on all the commands in progress, but that is not
trivial to implement.

> [1] https://commitfest.postgresql.org/patch/5117/

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Conflicting updates of command progress

From

Antonin Houska

Date:

23 April, 13:42:35

Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

> On 2025/04/15 2:13, Sami Imseih wrote:
> >> While working on [1] I realized that some field of pg_stat_progress_cluste has
> >> weird value.
>
> I ran into the same issue while working on [2], and eventually had to
> withdraw that patch because of it.

Have you considered reporting the progress of each command separately?

I think that can be implemented by moving the progress related fields from
PgBackendStatus into a new structure and by teaching the backend to insert a
new instance of that structure into a shared hash table (dshash.c) whenever a
command that needs progress reporting is being started.

pgstat.c uses such a hash table for various statistics objects, however it's
probably not a good idea to adjust this infrastructure by simply adding a new
object kind (e.g. PGSTAT_KIND_COMMAND). I think that pgstat.c is designed for
frequent updates of backend-local statistics and less frequent flushes
(e.g. at command completion) to the shared memory. That's not suitable for
progress reporting.

> [2] https://commitfest.postgresql.org/patch/5282/

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Conflicting updates of command progress

From

Sami Imseih

Date:

23 April, 23:20:27

>> pgstat_progress_start_command should only be called once by the entry
>> point for the
>> command. In theory, we could end up in a situation where start_command
>> is called multiple times during the same top-level command;

> Not only in theory - it actually happens when CLUSTER is rebuilding indexes.

In the case of CLUSTER, pgstat_progress_start_command is only called once,
but pgstat_progress_update_param is called in the context of both CLUSTER
and CREATE INDEX.

> That's a possible approach. However, if the index build takes long time in the
> CLUSTER case, the user will probably be interested in details about the index
> build.

I agree,

>> Is there a repro that you can share that shows the weird values? It sounds like
>> the repro is on top of [1]. Is that right?

>> You can reproduce the similar problem by creating a trigger function that
>> runs a progress-reporting command like COPY, and then COPY data into
>> a table that uses that trigger.

>> [2] https://commitfest.postgresql.org/patch/5282/

In this case, pgstat_progress_start_command is actually called
twice in the life of a single COPY command; the upper-level COPY
command calls pgstat_progress_start_command and then the nested COPY
command also does calls pgstat_progress_start_command.

> I think that can be implemented by moving the progress related fields from
> PgBackendStatus into a new structure and by teaching the backend to insert a
> new instance of that structure into a shared hash table (dshash.c)

I think this is a good idea in general to move the backend progress to
shared memory.
and with a new API that will deal with scenarios as described above.
1/ an (explicit) nested
command was started by a top-level command, such as the COPY case above.
2/ a top-level command triggered some other progress code implicitly, such as
CLUSTER triggering CREATE INDEX code.

I also like the shared memory approach because we can then not have to use
a message like the one introduced in f1889729dd3ab0 to support parallel index
vacuum progress 46ebdfe164c61.


-- 
Sami Imseih
Amazon Web Services (AWS)

Re: Conflicting updates of command progress

From

Antonin Houska

Date:

24 April, 12:33:45

Sami Imseih <samimseih@gmail.com> wrote:

> >> pgstat_progress_start_command should only be called once by the entry
> >> point for the
> >> command. In theory, we could end up in a situation where start_command
> >> is called multiple times during the same top-level command;
>
> > Not only in theory - it actually happens when CLUSTER is rebuilding indexes.
>
> In the case of CLUSTER, pgstat_progress_start_command is only called once,
> but pgstat_progress_update_param is called in the context of both CLUSTER
> and CREATE INDEX.

pgstat_progress_start_command() is called twice: First with
cmdtype=PROGRESS_COMMAND_CLUSTER, second with
PROGRESS_COMMAND_CREATE_INDEX. The first happens in cluster_rel() the second
in cluster_rel() -> rebuild_relation() -> finish_heap_swap() ->
reindex_relation() -> reindex_index().

It does not matter though, the current design only expects one command.

> > That's a possible approach. However, if the index build takes long time in the
> > CLUSTER case, the user will probably be interested in details about the index
> > build.
>
> I agree,
>
> >> Is there a repro that you can share that shows the weird values? It sounds like
> >> the repro is on top of [1]. Is that right?
>
> >> You can reproduce the similar problem by creating a trigger function that
> >> runs a progress-reporting command like COPY, and then COPY data into
> >> a table that uses that trigger.
>
> >> [2] https://commitfest.postgresql.org/patch/5282/
>
> In this case, pgstat_progress_start_command is actually called
> twice in the life of a single COPY command; the upper-level COPY
> command calls pgstat_progress_start_command and then the nested COPY
> command also does calls pgstat_progress_start_command.
>
> > I think that can be implemented by moving the progress related fields from
> > PgBackendStatus into a new structure and by teaching the backend to insert a
> > new instance of that structure into a shared hash table (dshash.c)
>
> I think this is a good idea in general to move the backend progress to
> shared memory.
> and with a new API that will deal with scenarios as described above.
> 1/ an (explicit) nested
> command was started by a top-level command, such as the COPY case above.
> 2/ a top-level command triggered some other progress code implicitly, such as
> CLUSTER triggering CREATE INDEX code.

Yes, I mean a new API. I imagine pgstat_progress_start_command() to initialize
the shared memory to track the "nested" command and to put the existing value
of MyBEEntry onto a stack. pgstat_progress_end_command() would then restore
the original value.

However, special care is needed for [2] because that's not necessarily
nesting: consider merge-joining two foreign tables, both using file_fdw. In
this case the pointers to the progress tracking shared memory would probably
have to be passed as function arguments.

(Note that the current progress tracking system also uses shared memory,
however in a less flexible way.)

> I also like the shared memory approach because we can then not have to use
> a message like the one introduced in f1889729dd3ab0 to support parallel index
> vacuum progress 46ebdfe164c61.

I didn't know about these patches. I'm not sure though if this needs to be
removed. Even if each worker updated the progress information separately
(would users appreciate that?), it should still send the progress information
to the leader before it (i.e. the worker) exits.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Conflicting updates of command progress

From

Sami Imseih

Date:

24 April, 21:15:06

> pgstat_progress_start_command() is called twice: First with
> cmdtype=PROGRESS_COMMAND_CLUSTER, second with
> PROGRESS_COMMAND_CREATE_INDEX. The first happens in cluster_rel() the second
> in cluster_rel() -> rebuild_relation() -> finish_heap_swap() ->
> reindex_relation() -> reindex_index().
>
> It does not matter though, the current design only expects one command.

When I set a breakpoint on pgstat_progress_start_command and ran a
CLUSTER on a table with a few indexes, I only saw start_command being
called once for PROGRESS_COMMAND_CLUSTER. Then I went back in the
code and realized that reindex_index when called via the CLUSTER command
has progress set to false. So as it stands now, only PROGRESS_COMMAND_CLUSTER
is effectively started when CLUSTER is called.

```
if (progress)
{
const int progress_cols[] = {
PROGRESS_CREATEIDX_COMMAND,
PROGRESS_CREATEIDX_INDEX_OID
};
const int64 progress_vals[] = {
PROGRESS_CREATEIDX_COMMAND_REINDEX,
indexId
};

pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
heapId);
pgstat_progress_update_multi_param(2, progress_cols, progress_vals);
}
```

--
Sami

Re: Conflicting updates of command progress

From

Antonin Houska

Date:

25 April, 20:34:48

Sami Imseih <samimseih@gmail.com> wrote:

> > pgstat_progress_start_command() is called twice: First with
> > cmdtype=PROGRESS_COMMAND_CLUSTER, second with
> > PROGRESS_COMMAND_CREATE_INDEX. The first happens in cluster_rel() the second
> > in cluster_rel() -> rebuild_relation() -> finish_heap_swap() ->
> > reindex_relation() -> reindex_index().
> >
> > It does not matter though, the current design only expects one command.
>
> When I set a breakpoint on pgstat_progress_start_command and ran a
> CLUSTER on a table with a few indexes, I only saw start_command being
> called once for PROGRESS_COMMAND_CLUSTER. Then I went back in the
> code and realized that reindex_index when called via the CLUSTER command
> has progress set to false. So as it stands now, only PROGRESS_COMMAND_CLUSTER
> is effectively started when CLUSTER is called.
>
> ```
> if (progress)
> {
> const int progress_cols[] = {
> PROGRESS_CREATEIDX_COMMAND,
> PROGRESS_CREATEIDX_INDEX_OID
> };
> const int64 progress_vals[] = {
> PROGRESS_CREATEIDX_COMMAND_REINDEX,
> indexId
> };
>
> pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
> heapId);
> pgstat_progress_update_multi_param(2, progress_cols, progress_vals);
> }

Ah, got it. So for now the REPACK CONCURRENTLY patch only needs to turn off
the progress reporting for PROGRESS_COMMAND_CREATE_INDEX properly. Thanks for
checking!

--
Antonin Houska
Web: https://www.cybertec-postgresql.com