Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
| From | Greg Nancarrow |
|---|---|
| Subject | Re: Parallel INSERT (INTO ... SELECT ...) |
| Date | |
| Msg-id | CAJcOf-ecOEtieLSSBZAKRuXc84kYTkG1kWPb6=xawC4MiWG8xA@mail.gmail.com Whole thread Raw |
| In response to | RE: Parallel INSERT (INTO ... SELECT ...) ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>) |
| Responses |
RE: Parallel INSERT (INTO ... SELECT ...)
|
| List | pgsql-hackers |
On Mon, Feb 1, 2021 at 8:19 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi,
>
> When developing the reloption patch, I noticed some issues in the patch.
>
> 1).
> > - Reduce Insert parallel-safety checks required for some SQL, by noting
> > that the subquery must operate on a relation (check for RTE_RELATION in
> > subquery range-table)
>
> + foreach(lcSub, rte->subquery->rtable)
> + {
> + rteSub = lfirst_node(RangeTblEntry, lcSub);
> + if (rteSub->rtekind == RTE_RELATION)
> + {
> + hasSubQueryOnRelation = true;
> + break;
> + }
> + }
> It seems we can not only search RTE_RELATION in rtable,
> because RTE_RELATION may exist in other place like:
>
> ---
> --** explain insert into target select (select * from test);
> Subplan's subplan
>
> --** with cte as (select * from test) insert into target select * from cte;
> In query's ctelist.
> ---
>
> May be we should use a walker function [1] to
> search the subquery and ctelist.
>
>
>
> [1]
> static bool
> relation_walker(Node *node)
> {
> if (node == NULL)
> return false;
>
> else if (IsA(node, RangeTblEntry))
> {
> RangeTblEntry *rte = (RangeTblEntry *) node;
> if (rte->rtekind == RTE_RELATION)
> return true;
>
> return false;
> }
>
> else if (IsA(node, Query))
> {
> Query *query = (Query *) node;
>
> /* Recurse into subselects */
> return query_tree_walker(query, relation_walker,
> NULL, QTW_EXAMINE_RTES_BEFORE);
> }
>
> /* Recurse to check arguments */
> return expression_tree_walker(node,
> relation_walker,
> NULL);
> }
>
I've had a further look at this, and this walker function is doing a
lot of work recursing the parse tree, and I'm not sure that it
reliably retrieves the information that we;re looking for, for all
cases of different SQL queries. Unless it can be made much more
efficient and specific to our needs, I think we should not try to do
this optimization, because there's too much overhead. Also, keep in
mind that for the current parallel SELECT functionality in Postgres, I
don't see any similar optimization being attempted (and such
optimization should be attempted at the SELECT level). So I don't
think we should be attempting such optimization in this patch (but
could be attempted in a separate patch, just related to current
parallel SELECT functionality).
Regards,
Greg Nancarrow
Fujitsu Australia
pgsql-hackers by date: