Re: speedup COPY TO for partitioned table. - Mailing list pgsql-hackers
From | jian he |
---|---|
Subject | Re: speedup COPY TO for partitioned table. |
Date | |
Msg-id | CACJufxHiCGL=Q5Y+fyqVyY2PbciHeS7jhimh=TnEAs-Pyc2VCw@mail.gmail.com Whole thread Raw |
In response to | Re: speedup COPY TO for partitioned table. (Chao Li <li.evan.chao@gmail.com>) |
Responses |
Re: speedup COPY TO for partitioned table.
|
List | pgsql-hackers |
On Thu, Oct 9, 2025 at 4:14 PM Chao Li <li.evan.chao@gmail.com> wrote: > > Hi Jian, > > Thanks for the patch. After reviewing it, I got a few small comments: > > On Oct 9, 2025, at 15:10, jian he <jian.universality@gmail.com> wrote: > > Please check the attached v16. > <v16-0001-support-COPY-partitioned_table-TO.patch> > 1 > ``` > + List *partitions; /* oid list of partition oid for copy to */ > ``` > > The comment doesn’t look very good. First, it repeats “oid”; second, as “List *partitions” implies multiple partitions,the comment should use plural OIDs. Maybe change the comment to “/* list of partition OIDs for COPY TO */" > yech, I need to improve these comments. > 2 > ``` > + /* > + * Collect a list of partitions containing data, so that later > + * DoCopyTo can copy the data from them. > + */ > + children = find_all_inheritors(RelationGetRelid(rel), > + AccessShareLock, > + NULL); > + > + foreach_oid(childreloid, children) > + { > + char relkind = get_rel_relkind(childreloid); > + > + if (relkind == RELKIND_FOREIGN_TABLE) > + { > + char *relation_name; > + > + relation_name = get_rel_name(childreloid); > + ereport(ERROR, > + errcode(ERRCODE_WRONG_OBJECT_TYPE), > + errmsg("cannot copy from foreign table \"%s\"", relation_name), > + errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s\"", > + relation_name, RelationGetRelationName(rel)), > + errhint("Try the COPY (SELECT ...) TO variant.")); > + } > + > + if (RELKIND_HAS_PARTITIONS(relkind)) > + children = foreach_delete_current(children, childreloid); > + } > ``` > > Is it better to move the RELKIND_HAS_PARTIONS() check to before FOREIGH_TABLE check and continue after foreach_delete_current()?Now every childreloid goes through the both checks, if we do the movement, then HAS_PARTIONS childwill go through 1 check. This is a tiny optimization. > I think the current handling is fine. > 3 > ``` > + if (RELKIND_HAS_PARTITIONS(relkind)) > + children = foreach_delete_current(children, childreloid); > + } > ``` > > I wonder if there is any specially consideration of using RELKIND_HAS_PARTITIONS() here? Because according to the functioncomment of find_all_inheritors(), it will only return OIDs of relations; while RELKIND_HAS_PARTITIONS checks forboth relations and views. Logically using this macro works, but it may lead to some confusion to code readers. > find_all_inheritors comments says: * Returns a list of relation OIDs including the given rel plus * all relations that inherit from it, directly or indirectly. CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id); CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id); ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5); If we copy partitioned table "pp" data out, but partitioned table "pp_1" don't have storage, so we have to skip it, using RELKIND_HAS_PARTITIONS to skip it should be fine. > 4 > ``` > @@ -722,6 +754,7 @@ BeginCopyTo(ParseState *pstate, > DestReceiver *dest; > > cstate->rel = NULL; > + cstate->partitions = NIL; > ``` > > Both NULL assignment are not needed as cstate is allocated by palloc0(). > I guess this is just a code convention. Such not necessary is quite common within the codebase. > 5 > ``` > +static void > +CopyRelTo(CopyToState cstate, Relation rel, Relation root_rel, > + uint64 *processed) > ``` > > Instead of using a pointer to pass out processed count, I think it’s better to return the process count. I understand thecurrent implementation allows continuous increment while calling this function in a loop. However, it’s a bit error-prone,a caller must make sure “processed” is well initialized. With returning a unit64, the caller’s code is stillsimple: > > ``` > processed += CopyRelTo(cstate, …); > ``` > pgstat_progress_update_param was within CopyRelTo. so we have to pass (uint64 *processed) to CopyRelTo. Am I missing something? > 6. In BeginCopyTo(), “children” list is created before “cstate” is created, it is not allocated under “cstate->copycontext”,so in EndCopyTo(), we should also free memory of “cstate->partitions”. > I think so.
pgsql-hackers by date: