Re: speedup COPY TO for partitioned table. - Mailing list pgsql-hackers

From jian he
Subject Re: speedup COPY TO for partitioned table.
Date
Msg-id CACJufxF0DuUuOE=3ru1AzAVtt+dex6NdAFF50kjT7Paz0a4-Sg@mail.gmail.com
Whole thread Raw
In response to Re: speedup COPY TO for partitioned table.  (Chao Li <li.evan.chao@gmail.com>)
Responses Re: speedup COPY TO for partitioned table.
List pgsql-hackers
On Fri, Oct 10, 2025 at 9:02 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
> On Oct 9, 2025, at 22:50, jian he <jian.universality@gmail.com> wrote:
>
> 3
> ```
> + if (RELKIND_HAS_PARTITIONS(relkind))
> + children = foreach_delete_current(children, childreloid);
> + }
> ```
>
> I wonder if there is any specially consideration of using RELKIND_HAS_PARTITIONS() here? Because according to the
functioncomment of find_all_inheritors(), it will only return OIDs of relations; while RELKIND_HAS_PARTITIONS checks
forboth relations and views. Logically using this macro works, but it may lead to some confusion to code readers. 
>
>
> find_all_inheritors comments says:
> *        Returns a list of relation OIDs including the given rel plus
> *        all relations that inherit from it, directly or indirectly.
>
> CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
> CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
> ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
>
> If we copy partitioned table "pp" data out, but partitioned table "pp_1"
> don't have storage, so we have to skip it, using RELKIND_HAS_PARTITIONS
> to skip it should be fine.
>
> My point is that RELKIND_HAS_PARTITIONS is defined as:
>
> #define RELKIND_HAS_PARTITIONS(relkind) \
> ((relkind) == RELKIND_PARTITIONED_TABLE || \
> (relkind) == RELKIND_PARTITIONED_INDEX)
>
> It just checks relkind to be table or index. The example in your explanation seems to not address my concern. Why do
weneed to check against index? 
>

the macro name RELKIND_HAS_PARTITIONS improves the readability, I think.
also we don't need to worry about partitioned index here, because
we are in the
``
else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
}
``
loop.

sure we can change it ``if (relkind == RELKIND_PARTITIONED_TABLE)``.

> 5
> ```
> +static void
> +CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
> +  uint64 *processed)
> ```
>
> Instead of using a pointer to pass out processed count, I think it’s better to return the process count. I understand
thecurrent implementation allows continuous increment while calling this function in a loop. However, it’s a bit
error-prone,a caller must make sure “processed” is well initialized. With returning a unit64, the caller’s code is
stillsimple: 
>
> ```
> processed += CopyRelTo(cstate, …);
> ```
>
> pgstat_progress_update_param was within CopyRelTo.
> so we have to pass (uint64 *processed) to CopyRelTo.
> Am I missing something?
>
>
> Make sense. I didn’t notice postage_progress_update_param. So, “processed” is both input and output. In that case, I
thinkthe comment for parameter “processed” should be enhanced, for example: 
>
if your function is:

static processed CopyRelationTo(CopyToState cstate, Relation rel,
Relation root_rel, uint64 *processed);

where function return value is also passed as function argument,
I think it will lead to more confusion.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: get rid of RM_HEAP2_ID
Next
From: Thomas Munro
Date:
Subject: Re: GNU/Hurd portability patches