Re: speedup COPY TO for partitioned table. - Mailing list pgsql-hackers
From | Chao Li |
---|---|
Subject | Re: speedup COPY TO for partitioned table. |
Date | |
Msg-id | 8316BFA1-D32A-4EDF-A190-A555B98D1EE4@gmail.com Whole thread Raw |
In response to | Re: speedup COPY TO for partitioned table. (jian he <jian.universality@gmail.com>) |
Responses |
Re: speedup COPY TO for partitioned table.
|
List | pgsql-hackers |
On Oct 9, 2025, at 22:50, jian he <jian.universality@gmail.com> wrote:3
```
+ if (RELKIND_HAS_PARTITIONS(relkind))
+ children = foreach_delete_current(children, childreloid);
+ }
```
I wonder if there is any specially consideration of using RELKIND_HAS_PARTITIONS() here? Because according to the function comment of find_all_inheritors(), it will only return OIDs of relations; while RELKIND_HAS_PARTITIONS checks for both relations and views. Logically using this macro works, but it may lead to some confusion to code readers.
find_all_inheritors comments says:
* Returns a list of relation OIDs including the given rel plus
* all relations that inherit from it, directly or indirectly.
CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id);
CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
If we copy partitioned table "pp" data out, but partitioned table "pp_1"
don't have storage, so we have to skip it, using RELKIND_HAS_PARTITIONS
to skip it should be fine.
My point is that RELKIND_HAS_PARTITIONS is defined as:
#define RELKIND_HAS_PARTITIONS(relkind) \
((relkind) == RELKIND_PARTITIONED_TABLE || \
(relkind) == RELKIND_PARTITIONED_INDEX)
It just checks relkind to be table or index. The example in your explanation seems to not address my concern. Why do we need to check against index?
4I guess this is just a code convention. Such not necessary is quite common
```
@@ -722,6 +754,7 @@ BeginCopyTo(ParseState *pstate,
DestReceiver *dest;
cstate->rel = NULL;
+ cstate->partitions = NIL;
```
Both NULL assignment are not needed as cstate is allocated by palloc0().
within the codebase.
I don’t agree. cstate has a lot of more fields with pointer types, why don’t set NULL to them?
5pgstat_progress_update_param was within CopyRelTo.
```
+static void
+CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel,
+ uint64 *processed)
```
Instead of using a pointer to pass out processed count, I think it’s better to return the process count. I understand the current implementation allows continuous increment while calling this function in a loop. However, it’s a bit error-prone, a caller must make sure “processed” is well initialized. With returning a unit64, the caller’s code is still simple:
```
processed += CopyRelTo(cstate, …);
```
so we have to pass (uint64 *processed) to CopyRelTo.
Am I missing something?
Make sense. I didn’t notice postage_progress_update_param. So, “processed” is both input and output. In that case, I think the comment for parameter “processed” should be enhanced, for example:
```
* processed: on entry, contains the current count of processed count;
* this function increments it by the number of rows copied
* from this relation and writes back the updated total.
```
Or a short version:
```
* processed: input/output; cumulative count of tuples processed, incremented here.
```
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
HighGo Software Co., Ltd.
https://www.highgo.com/
pgsql-hackers by date: