Thread: Progress report of CREATE INDEX for nested partitioned tables

Progress report of CREATE INDEX for nested partitioned tables

From

Ilya Gladyshev

Date:

10 December 2022, 08:18:32

Hi,

I have noticed that progress reporting for CREATE INDEX of partitioned
tables seems to be working poorly for nested partitioned tables. In
particular, it overwrites total and done partitions count when it
recurses down to child partitioned tables and it only reports top-level
partitions. So it's not hard to see something like this during CREATE
INDEX now:

postgres=# select partitions_total, partitions_done from
pg_stat_progress_create_index ;
 partitions_total | partitions_done 
------------------+-----------------
                1 |               2
(1 row)


I changed current behaviour to report the total number of partitions in
the inheritance tree and fixed recursion in the attached patch. I used
a static variable to keep the counter to avoid ABI breakage of
DefineIndex, so that we could backpatch this to previous versions.

Thanks,
Ilya Gladyshev

Attachment

0001-report-top-parent-progress-for-CREATE-INDEX.patch

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

11 December 2022, 06:33:34

On Sat, Dec 10, 2022 at 12:18:32PM +0400, Ilya Gladyshev wrote:
> Hi,
> 
> I have noticed that progress reporting for CREATE INDEX of partitioned
> tables seems to be working poorly for nested partitioned tables. In
> particular, it overwrites total and done partitions count when it
> recurses down to child partitioned tables and it only reports top-level
> partitions. So it's not hard to see something like this during CREATE
> INDEX now:
> 
> postgres=# select partitions_total, partitions_done from
> pg_stat_progress_create_index ;
>  partitions_total | partitions_done 
> ------------------+-----------------
>                 1 |               2
> (1 row)

Yeah.  I didn't verify, but it looks like this is a bug going to back to
v12.  As you said, when called recursively, DefineIndex() clobbers the
number of completed partitions.

Maybe DefineIndex() could flatten the list of partitions.  But I don't
think that can work easily with iteration rather than recursion.

Could you check what I've written as a counter-proposal ?

As long as we're changing partitions_done to include nested
sub-partitions, it seems to me like we should exclude intermediate
"catalog-only" partitioned indexes, and count only physical leaf
partitions.  Should it alo exclude any children with matching indexes,
which will also be catalog-only changes?  Probably not.

The docs say:
|When creating an index on a partitioned table, this column is set to the
|total number of partitions on which the index is to be created. This
|field is 0 during a REINDEX.

> I changed current behaviour to report the total number of partitions in
> the inheritance tree and fixed recursion in the attached patch. I used
> a static variable to keep the counter to avoid ABI breakage of
> DefineIndex, so that we could backpatch this to previous versions.

I wrote a bunch of assertions for this, which seems to have uncovered an
similar issue with COPY progress reporting, dating to 8a4f618e7.  I'm
not sure the assertions are okay.  I imagine they may break other
extensions, as with file_fdw.

-- 
Justin

Attachment

0001-fix-progress-reporting-of-nested-partitioned-indexes.patch

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Ilya Gladyshev

Date:

12 December 2022, 19:39:23

Could you check what I've written as a counter-proposal ?

I think that this might be a good solution to start with, it gives us the opportunity to improve the granularity later without any surprising changes for the end user. We could use this patch for previous versions and make more granular output in the latest. What do you think?

As long as we're changing partitions_done to include nested
sub-partitions, it seems to me like we should exclude intermediate
"catalog-only" partitioned indexes, and count only physical leaf
partitions. Should it alo exclude any children with matching indexes,
which will also be catalog-only changes? Probably not.

The docs say:
|When creating an index on a partitioned table, this column is set to the
|total number of partitions on which the index is to be created. This
|field is 0 during a REINDEX.

I agree with you on catalog-only partitioned indexes, but I think that in the perfect world we should exclude all the relations where the index isn’t actually created, so that means excluding attached indexes as well. However, IMO doing it this way will require too much of a code rewrite for quite a minor feature (but we could do it, ofc). I actually think that the progress view would be better off without the total number of partitions, but I’m not sure we have this option now. With this in mind, I think your proposal to exclude catalog-only indexes sounds reasonable to me, but I feel like the docs are off in this case, because the attached indexes are not created, but we pretend like they are in this metric, so we should fix one or the other.

I changed current behaviour to report the total number of partitions in
the inheritance tree and fixed recursion in the attached patch. I used
a static variable to keep the counter to avoid ABI breakage of
DefineIndex, so that we could backpatch this to previous versions.

I wrote a bunch of assertions for this, which seems to have uncovered an
similar issue with COPY progress reporting, dating to 8a4f618e7. I'm
not sure the assertions are okay. I imagine they may break other
extensions, as with file_fdw.

--
Justin
<0001-fix-progress-reporting-of-nested-partitioned-indexes.patch>

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

13 December 2022, 04:43:31

On Mon, Dec 12, 2022 at 11:39:23PM +0400, Ilya Gladyshev wrote:
> 
> > Could you check what I've written as a counter-proposal ?
> 
> I think that this might be a good solution to start with, it gives us the opportunity to improve the granularity
laterwithout any surprising changes for the end user. We could use this patch for previous versions and make more
granularoutput in the latest. What do you think?

Somehow, it hadn't occured to me that my patch "lost granularity" by
incrementing the progress bar by more than one...  Shoot.

> I actually think that the progress view would be better off without the total number of partitions, 

Just curious - why ?

> With this in mind, I think your proposal to exclude catalog-only indexes sounds reasonable to me, but I feel like the
docsare off in this case, because the attached indexes are not created, but we pretend like they are in this metric, so
weshould fix one or the other.

I agree that the docs should indicate whether we're counting "all
partitions", "direct partitions", and whether or not that includes
partitioned partitions, or just leaf partitions.

I have another proposal: since the original patch 3.5 years ago didn't
consider or account for sub-partitions, let's not start counting them
now.  It was never defined whether they were included or not (and I
guess that they're not common) so we can take this opportunity to
clarify the definition.

Alternately, if it's okay to add nparts_done to the IndexStmt, then
that's easy.

-- 
Justin

Attachment

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Ilya Gladyshev

Date:

13 December 2022, 19:07:06

On Mon, 2022-12-12 at 22:43 -0600, Justin Pryzby wrote:
> On Mon, Dec 12, 2022 at 11:39:23PM +0400, Ilya Gladyshev wrote:
> > 
> > > Could you check what I've written as a counter-proposal ?
> > 
> > I think that this might be a good solution to start with, it gives
> > us the opportunity to improve the granularity later without any
> > surprising changes for the end user. We could use this patch for
> > previous versions and make more granular output in the latest. What
> > do you think?
> 
> Somehow, it hadn't occured to me that my patch "lost granularity" by
> incrementing the progress bar by more than one...  Shoot.
> 
> > I actually think that the progress view would be better off without
> > the total number of partitions, 
> 
> Just curious - why ?

We don't really know how many indexes we are going to create, unless we
have some kind of preliminary "planning" stage where we acumulate all
the relations that will need to have indexes created (rather than
attached). And if someone wants the total, it can be calculated
manually without this view, it's less user-friendly, but if we can't do
it well, I would leave it up to the user.

> 
> > With this in mind, I think your proposal to exclude catalog-only
> > indexes sounds reasonable to me, but I feel like the docs are off
> > in this case, because the attached indexes are not created, but we
> > pretend like they are in this metric, so we should fix one or the
> > other.
> 
> I agree that the docs should indicate whether we're counting "all
> partitions", "direct partitions", and whether or not that includes
> partitioned partitions, or just leaf partitions.

Agree. I think that docs should also be explicit about the attached
indexes, if we decide to count them in as "created".

> I have another proposal: since the original patch 3.5 years ago
> didn't
> consider or account for sub-partitions, let's not start counting them
> now.  It was never defined whether they were included or not (and I
> guess that they're not common) so we can take this opportunity to
> clarify the definition.

I have had this thought initially, but then I thought that it's not
what I would want, if I was to track progress of multi-level
partitioned tables (but yeah, I guess it's pretty uncommon). In this
respect, I like your initial counter-proposal more, because it leaves
us room to improve this in the future. Otherwise, if we commit to
reporting only top-level partitions now, I'm not sure we will have the
opportunity to change this.

> Alternately, if it's okay to add nparts_done to the IndexStmt, then
> that's easy.

Yeah, or we could add another argument to DefineIndex. I don't know if
it's ok, or which option is better here in terms of compatibility and
interface-wise, so I have tried both of them, see the attached patches.

Attachment

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

17 December 2022, 14:30:02

On Tue, Dec 13, 2022 at 10:18:58PM +0400, Ilya Gladyshev wrote:
> > > I actually think that the progress view would be better off without
> > > the total number of partitions, 
> > 
> > Just curious - why ?
> 
> We don't really know how many indexes we are going to create, unless we
> have some kind of preliminary "planning" stage where we acumulate all
> the relations that will need to have indexes created (rather than
> attached). And if someone wants the total, it can be calculated
> manually without this view, it's less user-friendly, but if we can't do
> it well, I would leave it up to the user.

Thanks.  One other reason is that the partitions (and sub-partitions)
may not be equally sized.  Also, I've said before that it's weird to
report macroscopic progress about the number of partitions finihed in
the same place as reporting microscopic details like the number of
blocks done of the relation currently being processed.

> > I have another proposal: since the original patch 3.5 years ago
> > didn't
> > consider or account for sub-partitions, let's not start counting them
> > now.  It was never defined whether they were included or not (and I
> > guess that they're not common) so we can take this opportunity to
> > clarify the definition.
> 
> I have had this thought initially, but then I thought that it's not
> what I would want, if I was to track progress of multi-level
> partitioned tables (but yeah, I guess it's pretty uncommon). In this
> respect, I like your initial counter-proposal more, because it leaves
> us room to improve this in the future. Otherwise, if we commit to
> reporting only top-level partitions now, I'm not sure we will have the
> opportunity to change this.

We have the common problem of too many patches.

https://www.postgresql.org/message-id/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
This changes the progress reporting to show indirect children as
"total", and adds a global variable to track recursion into
DefineIndex(), allowing it to be incremented without the value being
lost to the caller.

https://www.postgresql.org/message-id/20221211063334.GB27893%40telsasoft.com
This also counts indirect children, but only increments the progress
reporting in the parent.  This has the disadvantage that when
intermediate partitions are in use, the done_partitions counter will
"jump" from (say) 20 to 30 without ever hitting 21-29.

https://www.postgresql.org/message-id/20221213044331.GJ27893%40telsasoft.com
This has two alternate patches:
- One patch changes to only update progress reporting of *direct*
  children.  This is minimal, but discourages any future plan to track
  progress involving intermediate partitions with finer granularity.
- A alternative patch adds IndexStmt.nparts_done, and allows reporting
  fine-grained progress involving intermediate partitions.

https://www.postgresql.org/message-id/flat/039564d234fc3d014c555a7ee98be69a9e724836.camel@gmail.com
This also reports progress of intermediate children.  The first patch
does it by adding an argument to DefineIndex() (which isn't okay to
backpatch).  And an alternate patch does it by adding to IndexStmt.

@committers: Is it okay to add nparts_done to IndexStmt ?

-- 
Justin

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

08 January 2023, 16:48:49

On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:
> We have the common problem of too many patches.
> 
> https://www.postgresql.org/message-id/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
> This changes the progress reporting to show indirect children as
> "total", and adds a global variable to track recursion into
> DefineIndex(), allowing it to be incremented without the value being
> lost to the caller.
> 
> https://www.postgresql.org/message-id/20221211063334.GB27893%40telsasoft.com
> This also counts indirect children, but only increments the progress
> reporting in the parent.  This has the disadvantage that when
> intermediate partitions are in use, the done_partitions counter will
> "jump" from (say) 20 to 30 without ever hitting 21-29.
> 
> https://www.postgresql.org/message-id/20221213044331.GJ27893%40telsasoft.com
> This has two alternate patches:
> - One patch changes to only update progress reporting of *direct*
>   children.  This is minimal, but discourages any future plan to track
>   progress involving intermediate partitions with finer granularity.
> - A alternative patch adds IndexStmt.nparts_done, and allows reporting
>   fine-grained progress involving intermediate partitions.
> 
> https://www.postgresql.org/message-id/flat/039564d234fc3d014c555a7ee98be69a9e724836.camel@gmail.com
> This also reports progress of intermediate children.  The first patch
> does it by adding an argument to DefineIndex() (which isn't okay to
> backpatch).  And an alternate patch does it by adding to IndexStmt.
> 
> @committers: Is it okay to add nparts_done to IndexStmt ?

Any hint about this ?

This should be resolved before the "CIC on partitioned tables" patch,
which I think is otherwise done.

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Ilya Gladyshev

Date:

09 January 2023, 08:44:22

On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:
> On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:
> > We have the common problem of too many patches.
> > 
> > https://www.postgresql.org/message-id/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
> > This changes the progress reporting to show indirect children as
> > "total", and adds a global variable to track recursion into
> > DefineIndex(), allowing it to be incremented without the value
> > being
> > lost to the caller.
> > 
> > https://www.postgresql.org/message-id/20221211063334.GB27893%40telsasoft.com
> > This also counts indirect children, but only increments the
> > progress
> > reporting in the parent.  This has the disadvantage that when
> > intermediate partitions are in use, the done_partitions counter
> > will
> > "jump" from (say) 20 to 30 without ever hitting 21-29.
> > 
> > https://www.postgresql.org/message-id/20221213044331.GJ27893%40telsasoft.com
> > This has two alternate patches:
> > - One patch changes to only update progress reporting of *direct*
> >   children.  This is minimal, but discourages any future plan to
> > track
> >   progress involving intermediate partitions with finer
> > granularity.
> > - A alternative patch adds IndexStmt.nparts_done, and allows
> > reporting
> >   fine-grained progress involving intermediate partitions.
> > 
> > https://www.postgresql.org/message-id/flat/039564d234fc3d014c555a7ee98be69a9e724836.camel@gmail.com
> > This also reports progress of intermediate children.  The first
> > patch
> > does it by adding an argument to DefineIndex() (which isn't okay to
> > backpatch).  And an alternate patch does it by adding to IndexStmt.
> > 
> > @committers: Is it okay to add nparts_done to IndexStmt ?
> 
> Any hint about this ?
> 
> This should be resolved before the "CIC on partitioned tables" patch,
> which I think is otherwise done.

I suggest that we move on with the IndexStmt patch and see what the
committers have to say about it. I have brushed the patch up a bit,
fixing TODOs and adding docs as per our discussion above.

Attachment

v2-0001-parts_done-via-IndexStmt.patch

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Tomas Vondra

Date:

17 January 2023, 19:44:36

On 1/9/23 09:44, Ilya Gladyshev wrote:
> On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:
>> On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:
>>> ...
>>>
>>> @committers: Is it okay to add nparts_done to IndexStmt ?
>>
>> Any hint about this ?
>>

AFAIK fields added at the end of a struct is seen as acceptable from the
ABI point of view. It's not risk-free, but we did that multiple times
when fixing bugs, IIRC.

The primary risk is old extensions (built on older minor version)
running on new server, getting confused by new fields (and implied
shifts in the structs). But fields at the end should be safe - the
extension simply ignores the stuff at the end. The one problem would be
arrays of structs, because even a field at the end changes the array
stride. But I don't think we do that with IndexStmt ...

Of course, if the "old" extension itself allocates the struct and passes
it to core code, that might still be an issue, because it'll allocate a
smaller struct, and core might see bogus data at the end.

On the other hand, new extensions on old server may get confused too,
because it may try setting a field that does not exist.

So ultimately it's about weighing risks vs. benefits - evaluating
whether fixing the issue is actually worth it.

The question is if/how many such extensions messing with IndexStmt in
this way actually exist. That is, allocate IndexStmt (or array of it). I
haven't found any, but maybe some extensions for index or partition
management do it? Not sure.

But ...

Do we actually need the new parts_done field? I mean, we already do
track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
st_progress_param array. Can't we simply read it from there? Then we
would not have ABI issues with the new field added to IndexStmt.

>> This should be resolved before the "CIC on partitioned tables" patch,
>> which I think is otherwise done.
> 
> I suggest that we move on with the IndexStmt patch and see what the
> committers have to say about it. I have brushed the patch up a bit,
> fixing TODOs and adding docs as per our discussion above.
> 

I did take a look at the patch, so here are my 2c:

1) num_leaf_partitions says it's "excluding foreign tables" but then it
uses RELKIND_HAS_STORAGE() which excludes various others relkinds, e.g.
partitioned tables etc. Minor, but perhaps a bit confusing.

2) I'd probably say count_leaf_partitions() instead.

3) The new part in DefineIndex counting leaf partitions should have a
comment before

    if (!OidIsValid(parentIndexId))
    { ... }

4) It's a bit confusing that one of the branches in DefineIndex just
sets stmt->parts_done without calling pgstat_progress_update_param
(while the other one does both). AFAICS the call is not needed because
we already updated it during the recursive DefineIndex call, but maybe
the comment should mention that?

As for the earlier discussion about the "correct" behavior for leaf vs.
non-leaf partitions and whether to calculate partitions in advance:

* I agree it's desirable to count partitions in advance, instead of
adding incrementally. The view is meant to provide "overview" of the
CREATE INDEX progress, and imagine you get

  partitions_total partitions_done
                10               9

so that you believe you're ~90% done. But then it jumps to the next
child and now you get

  partitions_total partitions_done
                20              10

which makes the view a bit useless for it's primary purpose, IMHO.

* I don't care very much about leaf vs. non-leaf partitions. If we
exclude non-leaf ones, fine with me. But the number of non-leaf ones
should be much smaller than leaf ones, and if the partition already has
a matching index that distorts the tracking too. Furthermore the
partitions may have different size etc. so the progress is only
approximate anyway.

I wonder if we could improve this to track the size of partitions for
total/done? That'd make leaf/non-leaf distinction unnecessary, because
non-leaf partitions have size 0.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

17 January 2023, 20:59:04

On Tue, Jan 17, 2023 at 08:44:36PM +0100, Tomas Vondra wrote:
> On 1/9/23 09:44, Ilya Gladyshev wrote:
> > On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:
> >> On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:
> >>> ...
> >>>
> >>> @committers: Is it okay to add nparts_done to IndexStmt ?
> >>
> >> Any hint about this ?
> >>
> 
> AFAIK fields added at the end of a struct is seen as acceptable from the
> ABI point of view. It's not risk-free, but we did that multiple times
> when fixing bugs, IIRC.

My question isn't whether it's okay to add a field at the end of a
struct in general, but rather whether it's acceptable to add this field
at the end of this struct, and not because it's unsafe to do in a minor
release, but whether someone is going to say that it's an abuse of the
data structure.

> Do we actually need the new parts_done field? I mean, we already do
> track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
> st_progress_param array. Can't we simply read it from there? Then we
> would not have ABI issues with the new field added to IndexStmt.

Good idea to try.

> As for the earlier discussion about the "correct" behavior for leaf vs.
> non-leaf partitions and whether to calculate partitions in advance:
> 
> * I agree it's desirable to count partitions in advance, instead of
> adding incrementally. The view is meant to provide "overview" of the
> CREATE INDEX progress, and imagine you get
> 
>   partitions_total partitions_done
>                 10               9
> 
> so that you believe you're ~90% done. But then it jumps to the next
> child and now you get
> 
>   partitions_total partitions_done
>                 20              10
> 
> which makes the view a bit useless for it's primary purpose, IMHO.

To be clear, that's the current, buggy behavior, and this thread is
about fixing it.  The proposed patches all ought to avoid that.

But the bug isn't caused by not "calculating partitions in advance".
Rather, the issue is that currently, the "total" is overwritten while
recursing.

That's a separate question from whether indirect partitions are counted
or not.

> I wonder if we could improve this to track the size of partitions for
> total/done? That'd make leaf/non-leaf distinction unnecessary, because
> non-leaf partitions have size 0.

Maybe, but it's out of scope for this patch.

Thanks for looking.

-- 
Justin

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Tomas Vondra

Date:

18 January 2023, 00:04:06


On 1/17/23 21:59, Justin Pryzby wrote:
> On Tue, Jan 17, 2023 at 08:44:36PM +0100, Tomas Vondra wrote:
>> On 1/9/23 09:44, Ilya Gladyshev wrote:
>>> On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:
>>>> On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:
>>>>> ...
>>>>>
>>>>> @committers: Is it okay to add nparts_done to IndexStmt ?
>>>>
>>>> Any hint about this ?
>>>>
>>
>> AFAIK fields added at the end of a struct is seen as acceptable from the
>> ABI point of view. It's not risk-free, but we did that multiple times
>> when fixing bugs, IIRC.
> 
> My question isn't whether it's okay to add a field at the end of a
> struct in general, but rather whether it's acceptable to add this field
> at the end of this struct, and not because it's unsafe to do in a minor
> release, but whether someone is going to say that it's an abuse of the
> data structure.
> 

Ah, you mean whether it's the right place for the parameter?

I don't think it is, really. IndexStmt is meant to be a description of
the CREATE INDEX statement, not something that includes info about how
it's processed. But it's the only struct we pass to the DefineIndex for
child indexes, so the only alternatives I can think of is a global
variable and the (existing) param array field.

Nevertheless, ABI compatibility is still relevant for backbranches.


>> Do we actually need the new parts_done field? I mean, we already do
>> track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
>> st_progress_param array. Can't we simply read it from there? Then we
>> would not have ABI issues with the new field added to IndexStmt.
> 
> Good idea to try.
> 

OK

>> As for the earlier discussion about the "correct" behavior for leaf vs.
>> non-leaf partitions and whether to calculate partitions in advance:
>>
>> * I agree it's desirable to count partitions in advance, instead of
>> adding incrementally. The view is meant to provide "overview" of the
>> CREATE INDEX progress, and imagine you get
>>
>>   partitions_total partitions_done
>>                 10               9
>>
>> so that you believe you're ~90% done. But then it jumps to the next
>> child and now you get
>>
>>   partitions_total partitions_done
>>                 20              10
>>
>> which makes the view a bit useless for it's primary purpose, IMHO.
> 
> To be clear, that's the current, buggy behavior, and this thread is
> about fixing it.  The proposed patches all ought to avoid that.
> 
> But the bug isn't caused by not "calculating partitions in advance".
> Rather, the issue is that currently, the "total" is overwritten while
> recursing.
> 

You're right the issue us about overwriting the total - not sure what I
was thinking about when writing this. I guess I got distracted by the
discussion about "preliminary planning" etc. Sorry for the confusion.

> That's a separate question from whether indirect partitions are counted
> or not.
> 
>> I wonder if we could improve this to track the size of partitions for
>> total/done? That'd make leaf/non-leaf distinction unnecessary, because
>> non-leaf partitions have size 0.
> 
> Maybe, but it's out of scope for this patch.
> 

+1, it was just an idea for future.


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

18 January 2023, 15:25:35

TBH, I think the best approach is what I did in:
0001-report-top-parent-progress-for-CREATE-INDEX.txt

That's a minimal patch, ideal for backpatching.

..which defines/clarifies that the progress reporting is only for
*direct* children.  That avoids the need to change any data structures,
and it's what was probably intended by the original patch, which doesn't
seem to have considered intermediate partitioned tables.

I think it'd be fine to re-define that in some future release, to allow
showing indirect children (probably only "leaves", and not intermediate
partitioned tables).  Or "total_bytes" or other global progress.

-- 
Justin

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Dean Rasheed

Date:

25 January 2023, 09:51:03

On Wed, 18 Jan 2023 at 15:25, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> TBH, I think the best approach is what I did in:
> 0001-report-top-parent-progress-for-CREATE-INDEX.txt
>
> That's a minimal patch, ideal for backpatching.
>
> ..which defines/clarifies that the progress reporting is only for
> *direct* children.  That avoids the need to change any data structures,
> and it's what was probably intended by the original patch, which doesn't
> seem to have considered intermediate partitioned tables.
>
> I think it'd be fine to re-define that in some future release, to allow
> showing indirect children (probably only "leaves", and not intermediate
> partitioned tables).  Or "total_bytes" or other global progress.
>

Hmm. My expectation as a user is that partitions_total includes both
direct and indirect (leaf) child partitions, that it is set just once
at the start of the process, and that partitions_done increases from
zero to partitions_total as the index-build proceeds. I think that
should be achievable with a minimally invasive patch that doesn't
change any data structures.

I agree with all the review comments Tomas posted. In particular, this
shouldn't need any changes to IndexStmt. I think the best approach
would be to just add a new function to backend_progress.c that offsets
a specified progress parameter by a specified amount, so that you can
just increment partitions_done by one or more, at the appropriate
points.

Regards,
Dean

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Ilya Gladyshev

Date:

31 January 2023, 15:32:20

> 17 янв. 2023 г., в 23:44, Tomas Vondra <tomas.vondra@enterprisedb.com> написал(а):
> Do we actually need the new parts_done field? I mean, we already do
> track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
> st_progress_param array. Can't we simply read it from there? Then we
> would not have ABI issues with the new field added to IndexStmt.

I think it’s a good approach and it could be useful outside of scope of this patch too. So I have attached a patch,
thatintroduces pgstat_progress_incr_param function for this purpose. There’s one thing I am not sure about, IIUC, we
canassume that the only process that can write into MyBEEntry of the current backend is the current backend itself,
thereforelooping to get consistent reads from this array is not required. Please correct me, if I am wrong here.

Attachment

0001-create-index-progress-increment.patch

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

01 February 2023, 04:29:49

On Tue, Jan 31, 2023 at 07:32:20PM +0400, Ilya Gladyshev wrote:
> > 17 янв. 2023 г., в 23:44, Tomas Vondra <tomas.vondra@enterprisedb.com> написал(а):
> > Do we actually need the new parts_done field? I mean, we already do
> > track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
> > st_progress_param array. Can't we simply read it from there? Then we
> > would not have ABI issues with the new field added to IndexStmt.
> 
> I think it’s a good approach and it could be useful outside of scope of this patch too. So I have attached a patch,
thatintroduces pgstat_progress_incr_param function for this purpose. There’s one thing I am not sure about, IIUC, we
canassume that the only process that can write into MyBEEntry of the current backend is the current backend itself,
thereforelooping to get consistent reads from this array is not required. Please correct me, if I am wrong here.

Thanks for the updated patch.

I think you're right - pgstat_begin_read_activity() is used for cases
involving other backends.  It ought to be safe for a process to read its
own status bits, since we know they're not also being written.

You changed DefineIndex() to update progress for the leaf indexes' when
called recursively.  The caller updates the progress for "attached"
indexes, but not created ones.  That allows providing fine-granularity
progress updates when using intermediate partitions, right ?  (Rather
than updating the progress by more than one at a time in the case of
intermediate partitioning).

If my understanding is right, that's subtle, and adds a bit of
complexity to the current code, so could use careful commentary.  I
suggest:

* If the index was attached, update progress for all its direct and
* indirect leaf indexes all at once.  If the index was built by calling
* DefineIndex() recursively, the called function is responsible for
* updating the progress report for built indexes.

...

* If this is the top-level index, we're done. When called recursively
* for child tables, the done partition counter is incremented now,
* rather than in the caller.

I guess you know that there were compiler warnings (related to your
question).
https://cirrus-ci.com/task/6571212386598912

pgstat_progress_incr_param() could call pgstat_progress_update_param()
rather than using its own Assert() and WRITE_ACTIVITY calls.  I'm not
sure which I prefer, though.

Also, there are whitespace/tab/style issues in
pgstat_progress_incr_param().

-- 
Justin

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Ilya Gladyshev

Date:

01 February 2023, 07:21:35


> 1 февр. 2023 г., в 08:29, Justin Pryzby <pryzby@telsasoft.com> написал(а):
>
> On Tue, Jan 31, 2023 at 07:32:20PM +0400, Ilya Gladyshev wrote:
>>> 17 янв. 2023 г., в 23:44, Tomas Vondra <tomas.vondra@enterprisedb.com> написал(а):
>>> Do we actually need the new parts_done field? I mean, we already do
>>> track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
>>> st_progress_param array. Can't we simply read it from there? Then we
>>> would not have ABI issues with the new field added to IndexStmt.
>>
>> I think it’s a good approach and it could be useful outside of scope of this patch too. So I have attached a patch,
thatintroduces pgstat_progress_incr_param function for this purpose. There’s one thing I am not sure about, IIUC, we
canassume that the only process that can write into MyBEEntry of the current backend is the current backend itself,
thereforelooping to get consistent reads from this array is not required. Please correct me, if I am wrong here. 
>
> Thanks for the updated patch.
>
> I think you're right - pgstat_begin_read_activity() is used for cases
> involving other backends.  It ought to be safe for a process to read its
> own status bits, since we know they're not also being written.
>
> You changed DefineIndex() to update progress for the leaf indexes' when
> called recursively.  The caller updates the progress for "attached"
> indexes, but not created ones.  That allows providing fine-granularity
> progress updates when using intermediate partitions, right ?  (Rather
> than updating the progress by more than one at a time in the case of
> intermediate partitioning).
>
> If my understanding is right, that's subtle, and adds a bit of
> complexity to the current code, so could use careful commentary.  I
> suggest:
>
> * If the index was attached, update progress for all its direct and
> * indirect leaf indexes all at once.  If the index was built by calling
> * DefineIndex() recursively, the called function is responsible for
> * updating the progress report for built indexes.
>
> ...
>
> * If this is the top-level index, we're done. When called recursively
> * for child tables, the done partition counter is incremented now,
> * rather than in the caller.

Yes, you are correct about the intended behavior, I added your comments to the patch.

> I guess you know that there were compiler warnings (related to your
> question).
> https://cirrus-ci.com/task/6571212386598912
>
> pgstat_progress_incr_param() could call pgstat_progress_update_param()
> rather than using its own Assert() and WRITE_ACTIVITY calls.  I'm not
> sure which I prefer, though.
>
> Also, there are whitespace/tab/style issues in
> pgstat_progress_incr_param().
>
> --
> Justin

Thank you for the review, I fixed the aforementioned issues in the v2.

Attachment

v2-0001-create-index-progress-increment.patch

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Alvaro Herrera

Date:

01 February 2023, 12:01:26

Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
lookup for every single element therein ... this sounds slow. 

In one of the callsites, we already have the partition descriptor
available.  We could just scan partdesc->is_leaf[] and add one for each
'true' value we see there.

In the other callsite, we had the table open just a few lines before the
place you call count_leaf_partitions.  Maybe we can rejigger things by
examining its state before closing it: if relkind is not partitioned we
know leaf_partitions=0, and only if partitioned we count leaf partitions.
I think that would save some work.  I also wonder if it's worth writing
a bespoke function for counting leaf partitions rather than relying on
find_all_inheritors.

I think there's probably not much point optimizing it further than that.
If there was, then we could think about creating a data representation
that we can build for the entire partitioning hierarchy in a single pass
with the count of leaf partitions that sit below each specific non-leaf;
but I think that's just over-engineering.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Ilya Gladyshev

Date:

01 February 2023, 14:21:00

> 1 февр. 2023 г., в 16:01, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):
>
> Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
> lookup for every single element therein ... this sounds slow.
>
> In one of the callsites, we already have the partition descriptor
> available.  We could just scan partdesc->is_leaf[] and add one for each
> 'true' value we see there.

The problem is that partdesc contains only direct children of the table and we need all the children down the
inheritancetree to count the total number of leaf partitions in the first callsite. 

> In the other callsite, we had the table open just a few lines before the
> place you call count_leaf_partitions.  Maybe we can rejigger things by
> examining its state before closing it: if relkind is not partitioned we
> know leaf_partitions=0, and only if partitioned we count leaf partitions.
> I think that would save some work.  I also wonder if it's worth writing
> a bespoke function for counting leaf partitions rather than relying on
> find_all_inheritors.

Sure, added this condition to avoid the extra work here.

Attachment

v3-0001-create-index-progress-increment.patch

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Matthias van de Meent

Date:

01 February 2023, 15:21:35

On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:
>
> > 1 февр. 2023 г., в 16:01, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):
> >
> > Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
> > lookup for every single element therein ... this sounds slow.
> >
> > In one of the callsites, we already have the partition descriptor
> > available.  We could just scan partdesc->is_leaf[] and add one for each
> > 'true' value we see there.
>
> The problem is that partdesc contains only direct children of the table and we need all the children down the
inheritancetree to count the total number of leaf partitions in the first callsite. 
>
> > In the other callsite, we had the table open just a few lines before the
> > place you call count_leaf_partitions.  Maybe we can rejigger things by
> > examining its state before closing it: if relkind is not partitioned we
> > know leaf_partitions=0, and only if partitioned we count leaf partitions.
> > I think that would save some work.  I also wonder if it's worth writing
> > a bespoke function for counting leaf partitions rather than relying on
> > find_all_inheritors.
>
> Sure, added this condition to avoid the extra work here.
>

>        When creating an index on a partitioned table, this column is set to
> -       the total number of partitions on which the index is to be created.
> +       the total number of leaf partitions on which the index is to be created or attached.

I think we should also add a note about the (now) non-constant nature
of the value, something along the lines of "This value is updated as
we're processing and discovering partitioned tables in the partition
hierarchy".

Kind regards,

Matthias van de Meent

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

01 February 2023, 15:53:36

On Wed, Feb 01, 2023 at 04:21:35PM +0100, Matthias van de Meent wrote:
> On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:
> > > 1 февр. 2023 г., в 16:01, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):
> > > Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
> > > lookup for every single element therein ... this sounds slow.
> > >
> > > In one of the callsites, we already have the partition descriptor
> > > available.  We could just scan partdesc->is_leaf[] and add one for each
> > > 'true' value we see there.
> >
> > The problem is that partdesc contains only direct children of the table and we need all the children down the
inheritancetree to count the total number of leaf partitions in the first callsite.
 
> >
> > > In the other callsite, we had the table open just a few lines before the
> > > place you call count_leaf_partitions.  Maybe we can rejigger things by
> > > examining its state before closing it: if relkind is not partitioned we
> > > know leaf_partitions=0, and only if partitioned we count leaf partitions.
> > > I think that would save some work.  I also wonder if it's worth writing
> > > a bespoke function for counting leaf partitions rather than relying on
> > > find_all_inheritors.
> >
> > Sure, added this condition to avoid the extra work here.
> >
> 
> >        When creating an index on a partitioned table, this column is set to
> > -       the total number of partitions on which the index is to be created.
> > +       the total number of leaf partitions on which the index is to be created or attached.
> 
> I think we should also add a note about the (now) non-constant nature
> of the value, something along the lines of "This value is updated as
> we're processing and discovering partitioned tables in the partition
> hierarchy".

But the TOTAL is constant, right ?  Updating the total when being called
recursively is the problem these patches fix.

-- 
Justin

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Matthias van de Meent

Date:

01 February 2023, 16:27:26

On Wed, 1 Feb 2023 at 16:53, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Wed, Feb 01, 2023 at 04:21:35PM +0100, Matthias van de Meent wrote:
> > On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:
> > > > 1 февр. 2023 г., в 16:01, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):
> > > > Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
> > > > lookup for every single element therein ... this sounds slow.
> > > >
> > > > In one of the callsites, we already have the partition descriptor
> > > > available.  We could just scan partdesc->is_leaf[] and add one for each
> > > > 'true' value we see there.
> > >
> > > The problem is that partdesc contains only direct children of the table and we need all the children down the
inheritancetree to count the total number of leaf partitions in the first callsite. 
> > >
> > > > In the other callsite, we had the table open just a few lines before the
> > > > place you call count_leaf_partitions.  Maybe we can rejigger things by
> > > > examining its state before closing it: if relkind is not partitioned we
> > > > know leaf_partitions=0, and only if partitioned we count leaf partitions.
> > > > I think that would save some work.  I also wonder if it's worth writing
> > > > a bespoke function for counting leaf partitions rather than relying on
> > > > find_all_inheritors.
> > >
> > > Sure, added this condition to avoid the extra work here.
> > >
> >
> > >        When creating an index on a partitioned table, this column is set to
> > > -       the total number of partitions on which the index is to be created.
> > > +       the total number of leaf partitions on which the index is to be created or attached.
> >
> > I think we should also add a note about the (now) non-constant nature
> > of the value, something along the lines of "This value is updated as
> > we're processing and discovering partitioned tables in the partition
> > hierarchy".
>
> But the TOTAL is constant, right ?  Updating the total when being called
> recursively is the problem these patches fix.

If that's the case, then I'm not seeing the 'fix' part of the patch. I
thought this patch was fixing the provably incorrect TOTAL value where
DONE > TOTAL due to the recursive operation overwriting the DONE/TOTAL
values instead of updating them.

In HEAD we set TOTAL to whatever number partitioned table we're
currently processing has - regardless of whether we're the top level
statement.
With the patch we instead add the number of child relations to that
count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
posted by Ilya. Approximately immediately after updating that count we
recurse to the child relations, and that only returns once it is done
creating the indexes, so both TOTAL and DONE go up as we process more
partitions in the hierarchy.

An example hierarchy:
CREATE TABLE parent (a, b) partition by list (a);
CREATE TABLE a1
    PARTITION OF parent FOR VALUES IN (1)
    PARTITION BY LIST (b);
CREATE TABLE a1bd
    PARTITION OF a1 DEFAULT;

CREATE TABLE a2
    PARTITION OF parent FOR VALUES IN (2)
    PARTITION BY LIST (b);
CREATE TABLE a2bd
    PARTITION OF a2 DEFAULT;

INSERT INTO parent (a, b) SELECT * from generate_series(1, 2) a(a)
cross join generate_series(1, 100000),b(b);
CREATE INDEX ON parent(a,b);

This will only discover that a2bd will need to be indexed after a1bd
is done (or vice versa, depending on which order a1 and a2 are
processed in the ).

Kind regards,

Matthias van de Meent

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Ilya Gladyshev

Date:

01 February 2023, 17:51:06

1 февр. 2023 г., в 20:27, Matthias van de Meent <boekewurm+postgres@gmail.com> написал(а):

On Wed, 1 Feb 2023 at 16:53, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Wed, Feb 01, 2023 at 04:21:35PM +0100, Matthias van de Meent wrote:
On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:
1 февр. 2023 г., в 16:01, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):
Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
lookup for every single element therein ... this sounds slow.

In one of the callsites, we already have the partition descriptor
available. We could just scan partdesc->is_leaf[] and add one for each
'true' value we see there.

The problem is that partdesc contains only direct children of the table and we need all the children down the inheritance tree to count the total number of leaf partitions in the first callsite.

In the other callsite, we had the table open just a few lines before the
place you call count_leaf_partitions. Maybe we can rejigger things by
examining its state before closing it: if relkind is not partitioned we
know leaf_partitions=0, and only if partitioned we count leaf partitions.
I think that would save some work. I also wonder if it's worth writing
a bespoke function for counting leaf partitions rather than relying on
find_all_inheritors.

Sure, added this condition to avoid the extra work here.

      When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of leaf partitions on which the index is to be created or attached.

I think we should also add a note about the (now) non-constant nature
of the value, something along the lines of "This value is updated as
we're processing and discovering partitioned tables in the partition
hierarchy".

But the TOTAL is constant, right ? Updating the total when being called
recursively is the problem these patches fix.

If that's the case, then I'm not seeing the 'fix' part of the patch. I
thought this patch was fixing the provably incorrect TOTAL value where
DONE > TOTAL due to the recursive operation overwriting the DONE/TOTAL
values instead of updating them.

In HEAD we set TOTAL to whatever number partitioned table we're
currently processing has - regardless of whether we're the top level
statement.
With the patch we instead add the number of child relations to that
count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
posted by Ilya. Approximately immediately after updating that count we
recurse to the child relations, and that only returns once it is done
creating the indexes, so both TOTAL and DONE go up as we process more
partitions in the hierarchy.

The TOTAL in the patch is set only when processing the top-level parent and it is not updated when we recurse, so yes, it is constant. From v3:

@@ -1219,8 +1243,14 @@ DefineIndex(Oid relationId,

Relation parentIndex;

TupleDesc parentDesc;

- pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,

- nparts);

+ if (!OidIsValid(parentIndexId))

+ {

+ int total_parts;

+ total_parts = count_leaf_partitions(relationId);

+ pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,

+ total_parts);

+ }

It is set to the total number of children on all levels of the hierarchy, not just the current one, so the total value doesn’t need to be updated later, because it is set to the correct value from the very beginning.

It is the DONE counter that is updated, and when we attach an index of a partition that is itself a partitioned table (like a2 in your example, if it already had an index created), it will be updated by the number of children of the partition.

@@ -1431,9 +1463,25 @@ DefineIndex(Oid relationId,

SetUserIdAndSecContext(child_save_userid,

child_save_sec_context);

}

+ else

+ {

+ int attached_parts = 1;

+ if (RELKIND_HAS_PARTITIONS(child_relkind))

+ attached_parts = count_leaf_partitions(childRelid);

+ /*

+ * If the index was attached, we need to update progress

+ * here, in its parent. For a partitioned index, we need

+ * to mark all of its children that were included in

+ * PROGRESS_CREATEIDX_PARTITIONS_TOTAL as done. If the

+ * index was built by calling DefineIndex() recursively,

+ * the called function is responsible for updating the

+ * progress report for built indexes.

+ */

+ pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, attached_parts);

+ }

- pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,

- i + 1);

free_attrmap(attmap);

}

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Matthias van de Meent

Date:

01 February 2023, 18:24:48

On Wed, 1 Feb 2023 at 18:51, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:
>
> 1 февр. 2023 г., в 20:27, Matthias van de Meent <boekewurm+postgres@gmail.com> написал(а):
>
>> In HEAD we set TOTAL to whatever number partitioned table we're
>> currently processing has - regardless of whether we're the top level
>> statement.
>> With the patch we instead add the number of child relations to that
>> count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
>> posted by Ilya. Approximately immediately after updating that count we
>> recurse to the child relations, and that only returns once it is done
>> creating the indexes, so both TOTAL and DONE go up as we process more
>> partitions in the hierarchy.
>
>
> The TOTAL in the patch is set only when processing the top-level parent and it is not updated when we recurse, so
yes,it is constant. From v3: 

Ugh, I misread the patch, more specifically count_leaf_partitions and
the !OidIsValid(parentIndexId) condition changes.

You are correct, sorry for the noise.

Kind regards,

Matthias van de Meent

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

02 February 2023, 15:18:07

On Wed, Feb 01, 2023 at 07:24:48PM +0100, Matthias van de Meent wrote:
> On Wed, 1 Feb 2023 at 18:51, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:
> >
> > 1 февр. 2023 г., в 20:27, Matthias van de Meent <boekewurm+postgres@gmail.com> написал(а):
> >
> >> In HEAD we set TOTAL to whatever number partitioned table we're
> >> currently processing has - regardless of whether we're the top level
> >> statement.
> >> With the patch we instead add the number of child relations to that
> >> count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
> >> posted by Ilya. Approximately immediately after updating that count we
> >> recurse to the child relations, and that only returns once it is done
> >> creating the indexes, so both TOTAL and DONE go up as we process more
> >> partitions in the hierarchy.
> >
> >
> > The TOTAL in the patch is set only when processing the top-level parent and it is not updated when we recurse, so
yes,it is constant. From v3:
 
> 
> Ugh, I misread the patch, more specifically count_leaf_partitions and
> the !OidIsValid(parentIndexId) condition changes.
> 
> You are correct, sorry for the noise.

That suggests that the comments could've been more clear.  I added a
comment suggested by Tomas and adjusted some others and wrote a commit
message.  I even ran pgindent for about the 3rd time ever.

002 are my changes as a separate patch, which you could apply to your
local branch.

And 003/4 are assertions that I wrote to demonstrate the problem and the
verify the fixes, but not being proposed for commit.

-- 
Justin

Attachment

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

08 February 2023, 22:40:49

On Thu, Feb 02, 2023 at 09:18:07AM -0600, Justin Pryzby wrote:
> On Wed, Feb 01, 2023 at 07:24:48PM +0100, Matthias van de Meent wrote:
> > On Wed, 1 Feb 2023 at 18:51, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:
> > > 1 февр. 2023 г., в 20:27, Matthias van de Meent <boekewurm+postgres@gmail.com> написал(а):
> > >
> > >> In HEAD we set TOTAL to whatever number partitioned table we're
> > >> currently processing has - regardless of whether we're the top level
> > >> statement.
> > >> With the patch we instead add the number of child relations to that
> > >> count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
> > >> posted by Ilya. Approximately immediately after updating that count we
> > >> recurse to the child relations, and that only returns once it is done
> > >> creating the indexes, so both TOTAL and DONE go up as we process more
> > >> partitions in the hierarchy.
> > >
> > > The TOTAL in the patch is set only when processing the top-level parent and it is not updated when we recurse, so
yes,it is constant. From v3:
 
> > 
> > Ugh, I misread the patch, more specifically count_leaf_partitions and
> > the !OidIsValid(parentIndexId) condition changes.
> > 
> > You are correct, sorry for the noise.
> 
> That suggests that the comments could've been more clear.  I added a
> comment suggested by Tomas and adjusted some others and wrote a commit
> message.  I even ran pgindent for about the 3rd time ever.
> 
> 002 are my changes as a separate patch, which you could apply to your
> local branch.
> 
> And 003/4 are assertions that I wrote to demonstrate the problem and the
> verify the fixes, but not being proposed for commit.

That was probably a confusing way to present it - I should've sent the
relative diff as a .txt rather than as patch 002.

This squishes together 001/2 as the main patch.
I believe it's ready.

-- 
Justin

Attachment

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

17 February 2023, 00:26:51

On Wed, Feb 08, 2023 at 04:40:49PM -0600, Justin Pryzby wrote:
> This squishes together 001/2 as the main patch.
> I believe it's ready.

Update to address a compiler warning in the supplementary patches adding
assertions.

Attachment

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Tom Lane

Date:

10 March 2023, 20:36:10

Justin Pryzby <pryzby@telsasoft.com> writes:
> Update to address a compiler warning in the supplementary patches adding
> assertions.

I took a look through this.  It seems like basically a good solution,
but the count_leaf_partitions() function is bothering me, for two
reasons:

1. It seems like a pretty expensive thing to do.  Don't we have the
info at hand somewhere already?

2. Is it really safe to do find_all_inheritors with NoLock?  If so,
a comment explaining why would be good.

            regards, tom lane

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

12 March 2023, 20:09:57

On Fri, Mar 10, 2023 at 03:36:10PM -0500, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
> > Update to address a compiler warning in the supplementary patches adding
> > assertions.
> 
> I took a look through this.  It seems like basically a good solution,
> but the count_leaf_partitions() function is bothering me, for two
> reasons:
> 
> 1. It seems like a pretty expensive thing to do.  Don't we have the
> info at hand somewhere already?

I don't know where that would be.  We need the list of both direct *and*
indirect partitions.  See:
https://www.postgresql.org/message-id/5073D187-4200-4A2D-BAC0-91C657E3C22E%40gmail.com

If it would help to avoid the concern, then I might consider proposing
not to call get_rel_relkind() ...

> 2. Is it really safe to do find_all_inheritors with NoLock?  If so,
> a comment explaining why would be good.

In both cases (both for the parent and for case of a partitioned child
with pre-existing indexes being ATTACHed), the table itself is already
locked by DefineIndex():

    lockmode = concurrent ? ShareUpdateExclusiveLock : ShareLock;
    rel = table_open(relationId, lockmode);

and
                childrel = table_open(childRelid, lockmode);
                ...
                table_close(childrel, NoLock);

And, find_all_inheritors() will also have been called by
ProcessUtilitySlow().  Maybe it's sufficient to mention that ?

-- 
Justin

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Tom Lane

Date:

12 March 2023, 20:14:06

Justin Pryzby <pryzby@telsasoft.com> writes:
> On Fri, Mar 10, 2023 at 03:36:10PM -0500, Tom Lane wrote:
>> I took a look through this.  It seems like basically a good solution,
>> but the count_leaf_partitions() function is bothering me, for two
>> reasons:

> ... find_all_inheritors() will also have been called by
> ProcessUtilitySlow().  Maybe it's sufficient to mention that ?

Hm.  Could we get rid of count_leaf_partitions by doing the work in
ProcessUtilitySlow?  Or at least passing that OID list forward instead
of recomputing it?

            regards, tom lane

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

12 March 2023, 22:06:03

On Sun, Mar 12, 2023 at 04:14:06PM -0400, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
> > On Fri, Mar 10, 2023 at 03:36:10PM -0500, Tom Lane wrote:
> >> I took a look through this.  It seems like basically a good solution,
> >> but the count_leaf_partitions() function is bothering me, for two
> >> reasons:
> 
> > ... find_all_inheritors() will also have been called by
> > ProcessUtilitySlow().  Maybe it's sufficient to mention that ?
> 
> Hm.  Could we get rid of count_leaf_partitions by doing the work in
> ProcessUtilitySlow?  Or at least passing that OID list forward instead
> of recomputing it?

count_leaf_partitions() is called in two places:

Once to get PROGRESS_CREATEIDX_PARTITIONS_TOTAL.  It'd be easy enough to
pass an integer total via IndexStmt (but I think we wanted to avoid
adding anything there, since it's not a part of the statement).

count_leaf_partitions() is also called for sub-partitions, in the case
that a matching "partitioned index" already exists, and the progress
report needs to be incremented by the number of leaves for which indexes
were ATTACHED.  We'd need a mapping from OID => npartitions (or to
compile some data structure of all the partitioned partitions).  I guess
CreateIndex() could call CreatePartitionDirectory().  But it looks like
that would be *more* expensive.

-- 
Justin

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Tom Lane

Date:

12 March 2023, 22:25:13

Justin Pryzby <pryzby@telsasoft.com> writes:
> On Sun, Mar 12, 2023 at 04:14:06PM -0400, Tom Lane wrote:
>> Hm.  Could we get rid of count_leaf_partitions by doing the work in
>> ProcessUtilitySlow?  Or at least passing that OID list forward instead
>> of recomputing it?

> count_leaf_partitions() is called in two places:

> Once to get PROGRESS_CREATEIDX_PARTITIONS_TOTAL.  It'd be easy enough to
> pass an integer total via IndexStmt (but I think we wanted to avoid
> adding anything there, since it's not a part of the statement).

I agree that adding such a field to IndexStmt would be a very bad idea.
However, adding another parameter to DefineIndex doesn't seem like a
problem.  Or could we just call pgstat_progress_update_param directly from
ProcessUtilitySlow, after counting the partitions in the existing loop?

> count_leaf_partitions() is also called for sub-partitions, in the case
> that a matching "partitioned index" already exists, and the progress
> report needs to be incremented by the number of leaves for which indexes
> were ATTACHED.

Can't you increment progress by one at the point where the actual attach
happens?

I also wonder whether leaving non-leaf partitions out of the total
is making things more complicated rather than simpler ...

> We'd need a mapping from OID => npartitions (or to
> compile some data structure of all the partitioned partitions).  I guess
> CreateIndex() could call CreatePartitionDirectory().  But it looks like
> that would be *more* expensive.

The reason I find this annoying is that the non-optional nature of the
progress reporting mechanism was sold on the basis that it would add
only negligible overhead.  Adding extra pass(es) over pg_inherits
breaks that promise.  Maybe it's cheap enough to not matter in the
big scheme of things, but we should not be having to make arguments
like that one.

            regards, tom lane

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Tom Lane

Date:

13 March 2023, 00:15:28

I wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
>> count_leaf_partitions() is also called for sub-partitions, in the case
>> that a matching "partitioned index" already exists, and the progress
>> report needs to be incremented by the number of leaves for which indexes
>> were ATTACHED.

> Can't you increment progress by one at the point where the actual attach
> happens?

Oh, never mind; now I realize that the point is that you didn't ever
iterate over those leaf indexes.

However, consider a thought experiment: assume for whatever reason that
all the actual index builds happen first, then all the cases where you
succeed in attaching a sub-partitioned index happen at the end of the
command.  In that case, the percentage-done indicator would go from
some-number to 100% more or less instantly.

What if we simply do nothing at sub-partitioned indexes?  Or if that's
slightly too radical, just increase the PARTITIONS_DONE counter by 1?
That would look indistinguishable from the case where all the attaches
happen at the end.

            regards, tom lane

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

13 March 2023, 13:50:08

On Sun, Mar 12, 2023 at 06:25:13PM -0400, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
> > On Sun, Mar 12, 2023 at 04:14:06PM -0400, Tom Lane wrote:
> >> Hm.  Could we get rid of count_leaf_partitions by doing the work in
> >> ProcessUtilitySlow?  Or at least passing that OID list forward instead
> >> of recomputing it?
> 
> > count_leaf_partitions() is called in two places:
> 
> > Once to get PROGRESS_CREATEIDX_PARTITIONS_TOTAL.  It'd be easy enough to
> > pass an integer total via IndexStmt (but I think we wanted to avoid
> > adding anything there, since it's not a part of the statement).
> 
> I agree that adding such a field to IndexStmt would be a very bad idea.
> However, adding another parameter to DefineIndex doesn't seem like a
> problem.

It's a problem since this is a bug and it's desirable to backpatch a
fix, right ?

> Or could we just call pgstat_progress_update_param directly from
> ProcessUtilitySlow, after counting the partitions in the existing loop?

That'd be fine if it was only needed for TOTAL, but it doesn't handle
the 2nd call to count_leaf_partitions().

On Sun, Mar 12, 2023 at 08:15:28PM -0400, Tom Lane wrote:
> I wrote:
> > Justin Pryzby <pryzby@telsasoft.com> writes:
> >> count_leaf_partitions() is also called for sub-partitions, in the case
> >> that a matching "partitioned index" already exists, and the progress
> >> report needs to be incremented by the number of leaves for which indexes
> >> were ATTACHED.
> 
> > Can't you increment progress by one at the point where the actual attach
> > happens?
> 
> Oh, never mind; now I realize that the point is that you didn't ever
> iterate over those leaf indexes.
> 
> However, consider a thought experiment: assume for whatever reason that
> all the actual index builds happen first, then all the cases where you
> succeed in attaching a sub-partitioned index happen at the end of the
> command.  In that case, the percentage-done indicator would go from
> some-number to 100% more or less instantly.
> 
> What if we simply do nothing at sub-partitioned indexes?  Or if that's
> slightly too radical, just increase the PARTITIONS_DONE counter by 1?
> That would look indistinguishable from the case where all the attaches
> happen at the end.

Incrementing by 0 sounds terrible, since someone who has intermediate
partitioned tables is likely to always see 0% done.  (It's true that
intermediate partitioned tables don't seem to have been considered by
the original patch, and it's indisputable that progress reporting
currently misbehaves in that case).

And incrementing PARTITIONS_DONE by 1 could lead to bogus progress
reporting with "N_done > N_Total" if an intermediate partitioned table
had no leaf partitions at all.  That's one of the problems this thread
is trying to fix (the other being "total changing in the middle of the
command").

Maybe your idea is usable though, since indirect partitioned indexes
*can* be counted correctly during recursion.  What's hard to fix is the
case that an index is both *partitioned* and *attached*.  Maybe it's
okay to count that case as 0.  The consequence is that the command would
end before the progress report got to 100%.

The other option seems to be to define the progress report to count only
*direct* children.
https://www.postgresql.org/message-id/20221213044331.GJ27893%40telsasoft.com

> The reason I find this annoying is that the non-optional nature of the
> progress reporting mechanism was sold on the basis that it would add
> only negligible overhead.  Adding extra pass(es) over pg_inherits
> breaks that promise.  Maybe it's cheap enough to not matter in the
> big scheme of things, but we should not be having to make arguments
> like that one.

If someone is running a DDL command involving nested partitions, I'm not
so concerned about the cost of additional scans of pg_inherits.  They
either have enough data to justify partitioning partitions, or they're
doing something silly.

-- 
Justin

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Tom Lane

Date:

13 March 2023, 14:42:59

Justin Pryzby <pryzby@telsasoft.com> writes:
> On Sun, Mar 12, 2023 at 06:25:13PM -0400, Tom Lane wrote:
>> I agree that adding such a field to IndexStmt would be a very bad idea.
>> However, adding another parameter to DefineIndex doesn't seem like a
>> problem.

> It's a problem since this is a bug and it's desirable to backpatch a
> fix, right ?

I do not think this is important enough to justify a back-patch.

> Incrementing by 0 sounds terrible, since someone who has intermediate
> partitioned tables is likely to always see 0% done.

How so?  The counter will increase after there's some actual work done,
ie building an index.  If there's no indexes to build then it hardly
matters, because the command will complete in very little time.

> And incrementing PARTITIONS_DONE by 1 could lead to bogus progress
> reporting with "N_done > N_Total" if an intermediate partitioned table
> had no leaf partitions at all.

Well, we could fix that if we made TOTAL be the total number of
descendants rather than just the leaves ;-).  But I think not
incrementing is probably better.

            regards, tom lane

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

14 March 2023, 14:34:16

On Mon, Mar 13, 2023 at 10:42:59AM -0400, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
> > On Sun, Mar 12, 2023 at 06:25:13PM -0400, Tom Lane wrote:
> >> I agree that adding such a field to IndexStmt would be a very bad idea.
> >> However, adding another parameter to DefineIndex doesn't seem like a
> >> problem.
> 
> > It's a problem since this is a bug and it's desirable to backpatch a
> > fix, right ?
> 
> I do not think this is important enough to justify a back-patch.

That's fine with me, but it comes as a surprise, and it might invalidate
earlier discussions, which were working under the constraint of
maintaining a compatible ABI.

> > Incrementing by 0 sounds terrible, since someone who has intermediate
> > partitioned tables is likely to always see 0% done.
> 
> How so?  The counter will increase after there's some actual work done,
> ie building an index.  If there's no indexes to build then it hardly
> matters, because the command will complete in very little time.

I misunderstood your idea as suggesting to skip progress reporting for
*every* intermediate partitioned index, and its leaves.  But I guess
what you meant is to skip progress reporting when ATTACHing an
intermediate partitioned index. That seems okay, since 1) intermediate
partitioned tables are probably rare, and 2) ATTACH is fast, so the
effect is indistinguisable from querying the progress report a few
moments later.

The idea would be for:
1) TOTAL to show the number of direct and indirect leaf partitions;
2) update progress while building direct or indirect indexes;
3) ATTACHing intermediate partitioned tables to increment by 0;
4) ATTACHing a direct child should continue to increment by 1,
since that common case already works as expected and shouldn't be
changed.

The only change from the current patch is (3).  (1) still calls
count_leaf_partitions(), but only once.  I'd prefer that to rearranging
the progress reporting to set the TOTAL in ProcessUtilitySlow().

-- 
Justin

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Tom Lane

Date:

14 March 2023, 14:46:00

Justin Pryzby <pryzby@telsasoft.com> writes:
> The idea would be for:
> 1) TOTAL to show the number of direct and indirect leaf partitions;
> 2) update progress while building direct or indirect indexes;
> 3) ATTACHing intermediate partitioned tables to increment by 0;
> 4) ATTACHing a direct child should continue to increment by 1,
> since that common case already works as expected and shouldn't be
> changed.

OK.

> The only change from the current patch is (3).  (1) still calls
> count_leaf_partitions(), but only once.  I'd prefer that to rearranging
> the progress reporting to set the TOTAL in ProcessUtilitySlow().

I don't agree with that.  find_all_inheritors is fairly expensive
and it seems completely silly to do it twice just to avoid adding
a parameter to DefineIndex.

            regards, tom lane

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Ilya Gladyshev

Date:

14 March 2023, 14:58:14


> 14 марта 2023 г., в 18:34, Justin Pryzby <pryzby@telsasoft.com> написал(а):
>
> On Mon, Mar 13, 2023 at 10:42:59AM -0400, Tom Lane wrote:
>> Justin Pryzby <pryzby@telsasoft.com> writes:
>>> On Sun, Mar 12, 2023 at 06:25:13PM -0400, Tom Lane wrote:
>>>> I agree that adding such a field to IndexStmt would be a very bad idea.
>>>> However, adding another parameter to DefineIndex doesn't seem like a
>>>> problem.
>>
>>> It's a problem since this is a bug and it's desirable to backpatch a
>>> fix, right ?
>>
>> I do not think this is important enough to justify a back-patch.
>
> That's fine with me, but it comes as a surprise, and it might invalidate
> earlier discussions, which were working under the constraint of
> maintaining a compatible ABI.
>
>>> Incrementing by 0 sounds terrible, since someone who has intermediate
>>> partitioned tables is likely to always see 0% done.
>>
>> How so?  The counter will increase after there's some actual work done,
>> ie building an index.  If there's no indexes to build then it hardly
>> matters, because the command will complete in very little time.
>
> I misunderstood your idea as suggesting to skip progress reporting for
> *every* intermediate partitioned index, and its leaves.  But I guess
> what you meant is to skip progress reporting when ATTACHing an
> intermediate partitioned index. That seems okay, since 1) intermediate
> partitioned tables are probably rare, and 2) ATTACH is fast, so the
> effect is indistinguisable from querying the progress report a few
> moments later.

+1 that counting attached partitioned indexes as 0 is fine.

> The idea would be for:
> 1) TOTAL to show the number of direct and indirect leaf partitions;
> 2) update progress while building direct or indirect indexes;
> 3) ATTACHing intermediate partitioned tables to increment by 0;
> 4) ATTACHing a direct child should continue to increment by 1,
> since that common case already works as expected and shouldn't be
> changed.
>
> The only change from the current patch is (3).  (1) still calls
> count_leaf_partitions(), but only once.  I'd prefer that to rearranging
> the progress reporting to set the TOTAL in ProcessUtilitySlow().
>
> --
> Justin

As for reusing TOTAL calculated outside of DefineIndex, as I can see, ProcessUtilitySlow is not the only call site for
DefineIndex(although, I don’t know whether all of them need progress tracking), for instance, there is ALTER TABLE that
callsDefineIndex to create index for constraints. So I feel like rearranging progress reporting will result in
unnecessarycode duplication in those call sites, so passing in an optional parameter seems to be easier here, if we are
goingto optimize it, after all. Especially if back-patching is a non-issue.

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

16 March 2023, 00:07:33

On Tue, Mar 14, 2023 at 06:58:14PM +0400, Ilya Gladyshev wrote:
> > The only change from the current patch is (3).  (1) still calls
> > count_leaf_partitions(), but only once.  I'd prefer that to rearranging
> > the progress reporting to set the TOTAL in ProcessUtilitySlow().
>
> As for reusing TOTAL calculated outside of DefineIndex, as I can see, ProcessUtilitySlow is not the only call site
forDefineIndex (although, I don’t know whether all of them need progress tracking), for instance, there is ALTER TABLE
thatcalls DefineIndex to create index for constraints. So I feel like rearranging progress reporting will result in
unnecessarycode duplication in those call sites, so passing in an optional parameter seems to be easier here, if we are
goingto optimize it, after all. Especially if back-patching is a non-issue.
 

Yeah.  See attached.  I don't like duplicating the loop.  Is this really
the right direction to go ?

I haven't verified if the child tables are locked in all the paths which
would call count_leaf_partitions().  But why is it important to lock
them for this?  If they weren't locked before, that'd be a pre-existing
problem...

Attachment

0001-fix-CREATE-INDEX-progress-report-with-nested-partiti.patch

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Ilya Gladyshev

Date:

16 March 2023, 15:04:16


> 16 марта 2023 г., в 04:07, Justin Pryzby <pryzby@telsasoft.com> написал(а):
>
> On Tue, Mar 14, 2023 at 06:58:14PM +0400, Ilya Gladyshev wrote:
>>> The only change from the current patch is (3).  (1) still calls
>>> count_leaf_partitions(), but only once.  I'd prefer that to rearranging
>>> the progress reporting to set the TOTAL in ProcessUtilitySlow().
>>
>> As for reusing TOTAL calculated outside of DefineIndex, as I can see, ProcessUtilitySlow is not the only call site
forDefineIndex (although, I don’t know whether all of them need progress tracking), for instance, there is ALTER TABLE
thatcalls DefineIndex to create index for constraints. So I feel like rearranging progress reporting will result in
unnecessarycode duplication in those call sites, so passing in an optional parameter seems to be easier here, if we are
goingto optimize it, after all. Especially if back-patching is a non-issue. 
>
> Yeah.  See attached.  I don't like duplicating the loop.  Is this really
> the right direction to go ?
>
> I haven't verified if the child tables are locked in all the paths which
> would call count_leaf_partitions().  But why is it important to lock
> them for this?  If they weren't locked before, that'd be a pre-existing
> problem...
> <0001-fix-CREATE-INDEX-progress-report-with-nested-partiti.patch>

I’m not sure what the general policy on locking is, but I have checked ALTER TABLE ADD INDEX, and the all the
partitionsseem to be locked on the first entry to DefineIndex there. All other call sites pass in the parentIndexId,
whichmeans the progress tracking machinery will not be initialized, so I think, we don’t need to do locking in
count_leaf_partitions(). 

The approach in the patch looks good to me. Some nitpicks on the patch:
1. There’s an unnecessary second call to get_rel_relkind in ProcessUtilitySlow, we can just use what’s in the variable
relkind.
2. We can also combine else and if to have one less nested level like that:
+                else if (!RELKIND_HAS_PARTITIONS(child_relkind))

3. There was a part of the comment saying "If the index was built by calling DefineIndex() recursively, the called
functionis responsible for updating the progress report for built indexes.", I think it is still useful to have it
there.

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

21 March 2023, 18:43:43

On Thu, Mar 16, 2023 at 07:04:16PM +0400, Ilya Gladyshev wrote:
> > 16 марта 2023 г., в 04:07, Justin Pryzby <pryzby@telsasoft.com> написал(а):
> > 
> > On Tue, Mar 14, 2023 at 06:58:14PM +0400, Ilya Gladyshev wrote:
> >>> The only change from the current patch is (3).  (1) still calls
> >>> count_leaf_partitions(), but only once.  I'd prefer that to rearranging
> >>> the progress reporting to set the TOTAL in ProcessUtilitySlow().
> >> 
> >> As for reusing TOTAL calculated outside of DefineIndex, as I can see, ProcessUtilitySlow is not the only call site
forDefineIndex (although, I don’t know whether all of them need progress tracking), for instance, there is ALTER TABLE
thatcalls DefineIndex to create index for constraints. So I feel like rearranging progress reporting will result in
unnecessarycode duplication in those call sites, so passing in an optional parameter seems to be easier here, if we are
goingto optimize it, after all. Especially if back-patching is a non-issue.
 
> > 
> > Yeah.  See attached.  I don't like duplicating the loop.  Is this really
> > the right direction to go ?
> > 
> > I haven't verified if the child tables are locked in all the paths which
> > would call count_leaf_partitions().  But why is it important to lock
> > them for this?  If they weren't locked before, that'd be a pre-existing
> > problem...
> > <0001-fix-CREATE-INDEX-progress-report-with-nested-partiti.patch>
> 
> I’m not sure what the general policy on locking is, but I have checked ALTER TABLE ADD INDEX, and the all the
partitionsseem to be locked on the first entry to DefineIndex there. All other call sites pass in the parentIndexId,
whichmeans the progress tracking machinery will not be initialized, so I think, we don’t need to do locking in
count_leaf_partitions().
 

> The approach in the patch looks good to me. Some nitpicks on the patch: 
> 1. There’s an unnecessary second call to get_rel_relkind in ProcessUtilitySlow, we can just use what’s in the
variablerelkind.
 
> 2. We can also combine else and if to have one less nested level like that:
> +                else if (!RELKIND_HAS_PARTITIONS(child_relkind))
> 
> 3. There was a part of the comment saying "If the index was built by calling DefineIndex() recursively, the called
functionis responsible for updating the progress report for built indexes.", I think it is still useful to have it
there.

Thanks, I addressed (1) and (3).  (2) is deliberate, to allow a place to
put the comment which is not specific to the !HAS_PARTITIONS case.

-- 
Justin

Attachment

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Tom Lane

Date:

23 March 2023, 20:35:46

So I'm still pretty desperately unhappy with count_leaf_partitions.
I don't like expending significant cost purely for progress tracking
purposes, I don't like the undocumented assumption that NoLock is
safe, and what's more, if it is safe then we've already traversed
the inheritance tree to lock everything so in principle we could
have the count already.  However, it does seem like getting that
knowledge from point A to point B would be a mess in most places.

One thing we could do to reduce the cost (and improve the safety)
is to forget the idea of checking the relkinds and just set the
PARTITIONS_TOTAL count to list_length() of the find_all_inheritors
result.  We've already agreed that it's okay if the PARTITIONS_DONE
count never reaches PARTITIONS_TOTAL, so this would just be taking
that idea further.  (Or we could increment PARTITIONS_DONE for
non-leaf partitions when we visit them, thus making that TOTAL
more nearly correct.)  Furthermore, as things stand it's not hard
for PARTITIONS_TOTAL to be zero --- there's at least one such case
in the regression tests --- and that seems just weird to me.

By the by, this is awful code:

+        if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))

Consult the definition of RELKIND_HAS_STORAGE to see why.
But I want to get rid of that rather than fixing it.

            regards, tom lane

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

25 March 2023, 02:53:19

On Thu, Mar 23, 2023 at 04:35:46PM -0400, Tom Lane wrote:
> So I'm still pretty desperately unhappy with count_leaf_partitions.
> I don't like expending significant cost purely for progress tracking
> purposes, I don't like the undocumented assumption that NoLock is
> safe, and what's more, if it is safe then we've already traversed
> the inheritance tree to lock everything so in principle we could
> have the count already.  However, it does seem like getting that
> knowledge from point A to point B would be a mess in most places.
> 
> One thing we could do to reduce the cost (and improve the safety)
> is to forget the idea of checking the relkinds and just set the
> PARTITIONS_TOTAL count to list_length() of the find_all_inheritors
> result.

Actually list_length() minus 1 ...

> We've already agreed that it's okay if the PARTITIONS_DONE
> count never reaches PARTITIONS_TOTAL, so this would just be taking
> that idea further.  (Or we could increment PARTITIONS_DONE for
> non-leaf partitions when we visit them, thus making that TOTAL
> more nearly correct.)

Yes, I think that's actually more correct.  If TOTAL is set without
regard to relkind, then DONE ought to be set the same way.

I updated the documentation to indicate that the counters include the
intermediate partitioned rels, but I wonder if it's better to say
nothing and leave that undefined.

> Furthermore, as things stand it's not hard
> for PARTITIONS_TOTAL to be zero --- there's at least one such case
> in the regression tests --- and that seems just weird to me.

I don't know why it'd seem weird.  postgres doesn't create partitions
automatically, so by default there are none.  If we create a table but
never load any data, it'll have no partitions.  Also, the TOTAL=0 case
won't go away just because we start counting intermediate partitions.

> By the by, this is awful code:
> 
> +        if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
> 
> Consult the definition of RELKIND_HAS_STORAGE to see why.
> But I want to get rid of that rather than fixing it.

Good point, but I'd burden-shift the blame to RELKIND_HAS_STORAGE().

BTW, I promoted myself to a co-author of the patch.  My interest here is
to resolve this hoping to allow the CIC patch to progress.

-- 
Justin

Attachment

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Tom Lane

Date:

25 March 2023, 15:55:13

Justin Pryzby <pryzby@telsasoft.com> writes:
> On Thu, Mar 23, 2023 at 04:35:46PM -0400, Tom Lane wrote:
>> Furthermore, as things stand it's not hard
>> for PARTITIONS_TOTAL to be zero --- there's at least one such case
>> in the regression tests --- and that seems just weird to me.

> I don't know why it'd seem weird.  postgres doesn't create partitions
> automatically, so by default there are none.  If we create a table but
> never load any data, it'll have no partitions.

My problem with it is that it's not clear how to tell "no partitioned
index creation in progress" from "partitioned index creation in progress,
but total = 0".  Maybe there's some out-of-band way to tell that in the
stats reporting system, but still it's a weird corner case.

> Also, the TOTAL=0 case
> won't go away just because we start counting intermediate partitions.

That's why I wanted list_length() not list_length() - 1.  We are
doing *something* at the top partitioned table, it just doesn't
involve a table scan, so I don't find this totally unreasonable.
If you agree we are doing work at intermediate partitioned tables,
how are we not doing work at the top one?

            regards, tom lane

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

25 March 2023, 16:36:08

On Sat, Mar 25, 2023 at 11:55:13AM -0400, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
> > On Thu, Mar 23, 2023 at 04:35:46PM -0400, Tom Lane wrote:
> >> Furthermore, as things stand it's not hard
> >> for PARTITIONS_TOTAL to be zero --- there's at least one such case
> >> in the regression tests --- and that seems just weird to me.
> 
> > I don't know why it'd seem weird.  postgres doesn't create partitions
> > automatically, so by default there are none.  If we create a table but
> > never load any data, it'll have no partitions.
> 
> My problem with it is that it's not clear how to tell "no partitioned
> index creation in progress" from "partitioned index creation in progress,
> but total = 0".  Maybe there's some out-of-band way to tell that in the
> stats reporting system, but still it's a weird corner case.
> 
> > Also, the TOTAL=0 case
> > won't go away just because we start counting intermediate partitions.
> 
> That's why I wanted list_length() not list_length() - 1.  We are
> doing *something* at the top partitioned table, it just doesn't
> involve a table scan, so I don't find this totally unreasonable.
> If you agree we are doing work at intermediate partitioned tables,
> how are we not doing work at the top one?

What you're proposing would redefine the meaning of
PARTITIONS_DONE/TOTAL, even in the absence of intermediate partitioned
tables.  Which might be okay, but the scope of this thread/patch was to
fix the behavior involving intermediate partitioned tables.

It's somewhat weird to me that find_all_inheritors(rel) returns the rel
itself.  But it's an internal function, and evidently that's what's
needed/desirable to do, so that's fine.

However, "PARTITIONS_TOTAL" has a certain user-facing definition, and
"Number of partitions" is easier to explain than "Number of partitions
plus the rel itself", and IMO an easier definition is a better one.

Your complaint seems similar to something I've said a few times before:
it's weird to expose macroscopic progress reporting of partitioned
tables in the same view and in the same *row* as microscopic progress of
its partitions.  But changing that is a job for another patch.  I won't
be opposed to it if someone were to propose a patch to remove
partitions_{done,total}.  See also:
https://www.postgresql.org/message-id/flat/YCy5ZMt8xAyoOMmv%40paquier.xyz#b20d1be226a93dacd3fd40b402315105

-- 
Justin

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Tom Lane

Date:

25 March 2023, 17:11:17

Justin Pryzby <pryzby@telsasoft.com> writes:
> On Sat, Mar 25, 2023 at 11:55:13AM -0400, Tom Lane wrote:
>> That's why I wanted list_length() not list_length() - 1.  We are
>> doing *something* at the top partitioned table, it just doesn't
>> involve a table scan, so I don't find this totally unreasonable.
>> If you agree we are doing work at intermediate partitioned tables,
>> how are we not doing work at the top one?

> What you're proposing would redefine the meaning of
> PARTITIONS_DONE/TOTAL, even in the absence of intermediate partitioned
> tables.  Which might be okay, but the scope of this thread/patch was to
> fix the behavior involving intermediate partitioned tables.

I'm a little skeptical of that argument, because this patch is already
redefining the meaning of PARTITIONS_TOTAL.  The fact that the existing
documentation is vague enough to be read either way doesn't make it not
a change.

Still, in the interests of getting something done I'll drop the issue.

            regards, tom lane

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Tom Lane

Date:

25 March 2023, 19:43:32

I pushed 0001 with some cosmetic changes (for instance, trying to
make the style of the doc entries for partitions_total/partitions_done
match the rest of their table).

I'm not touching 0002 or 0003, because I think they're fundamentally
a bad idea.  Progress reporting is inherently inexact, because it's
so hard to predict the amount of work to be done in advance -- have
you ever seen a system anywhere whose progress bars reliably advance
at a uniform rate?  I think adding assertions that the estimates are
error-free is just going to cause headaches.  As an example, I added
a comment pointing out that the current fix won't crash and burn if
the caller failed to lock all the child tables in advance: the
find_all_inheritors call should be safe anyway, so the worst consequence
would be an imprecise partitions_total estimate.  But that argument
falls down if we're going to add assertions that partitions_total
isn't in error.

            regards, tom lane

Re: Progress report of CREATE INDEX for nested partitioned tables

From

Justin Pryzby

Date:

26 March 2023, 14:08:41

On Sat, Mar 25, 2023 at 03:43:32PM -0400, Tom Lane wrote:
> I pushed 0001 with some cosmetic changes (for instance, trying to
> make the style of the doc entries for partitions_total/partitions_done
> match the rest of their table).

Thanks.

> I'm not touching 0002 or 0003, because I think they're fundamentally
> a bad idea.  Progress reporting is inherently inexact, because it's

Nobody could disagree that it's inexact.  The assertions are for minimal
sanity tests and consistency.  Like if "total" is set multiple times (as
in this patch), or if a progress value goes backwards.  Anyway the
assertions exposed two other issues that would need to be fixed before
the assertions themselves could be proposed.

-- 
Justin