Re: Push down more full joins in postgres_fdw - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: Push down more full joins in postgres_fdw |
Date | |
Msg-id | 732be34b-ca0d-c98c-7852-36281a2f3ba9@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Push down more full joins in postgres_fdw (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: Push down more full joins in postgres_fdw
|
List | pgsql-hackers |
On 2016/10/22 17:19, Ashutosh Bapat wrote: > Review for postgres-fdw-more-full-join-pushdown-v6 patch. > > The patch applies cleanly and regression is clean (make check in > regress directory and that in postgres_fdw). Thanks for the review! > Here are some comments. > 1. Because of the following code change, for a joinrel we might end up using > attrs_used, which will be garbage for a join rel. The assumption that tlist can > not be NIL for a join relation is wrong. Even for a join relation, tlist can be > NULL. Consider query 'explain verbose select count(*) from ft1, ft2' Since > count(*) doesn't need any columns, the tlist is NIL. With the patch the server > crashes for this query. > - if (foreignrel->reloptkind == RELOPT_JOINREL) > + /* > + * Note: tlist for a base relation might be non-NIL. For example, if the > + * base relation is an operand of a foreign join performing a full outer > + * join and has non-NIL remote_conds, the base relation will be deparsed > + * as a subquery, so the tlist for the base relation could be non-NIL. > + */ > + if (tlist != NIL) > { > - /* For a join relation use the input tlist */ > > We can not decide whether to use deparseExplicitTargetList or > deparse it from attrs_used based on the tlist. SELECT lists, whether it's > topmost SELECT or a subquery (even on a base relation), for deparsing a > JOIN query need to use deparseExplicitTargetList. Will fix. > 2. The code in deparseRangeTblRef() dealing with subqueries, is very similar to > deparseSelectStmtForRel(). The only thing deparseRangeTblRef() does not require > is the appending ORDER BY, which is determined by existence of pathkeys > argument. Probably we should reuse deparseSelectStmtForRel(), instead of > duplicating the code. This will also make the current changes to > deparseSelectSql unnecessary. Will do. > 3. Why do we need following change? The elog() just few lines above says that > we expect only Var nodes. Why then we use deparseExpr() instead of > deparseVar()? > if (i > 0) > appendStringInfoString(buf, ", "); > - deparseVar(var, context); > + deparseExpr((Expr *) var, context); > > *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1); > > And I get the answer for that question, somewhere down in the patch > + /* > + * If the given expression is an output column of a subquery-in-FROM, > + * deparse the alias to the expression instead. > + */ > + if (IsA(node, Var)) > + { > + int tabno; > + int colno; > + > + if (isSubqueryExpr(node, context->foreignrel, &tabno, &colno)) > + { > + appendStringInfo(context->buf, "%s%d.%s%d", > + SS_TAB_ALIAS_PREFIX, tabno, > + SS_COL_ALIAS_PREFIX, colno); > + return; > + } > + } > + > > Functionally, this code belongs to deparseVar() and not deparseExpr(). Like all > other functions which handle Expr nodes, deparseExpr() is expected to be a > dispatcher to the node specific function. OK, will change that way. > 4. This comment is good to have there, but is unrelated to this patch. May be a > separate patch? > + /* Don't generate bad syntax if no columns */ > > 5. Changed comment doesn't say anything different from the original comment. It > may have been good to have it this way to start with, but changing it now > doesn't bring anything new. You seem to have just merged prologue of the > function with a comment in that function. I think, this change is unnecessary. > - * The function constructs ... JOIN ... ON ... for join relation. For a base > - * relation it just returns schema-qualified tablename, with the appropriate > - * alias if so requested. > + * For a join relation the clause of the following form is appended to buf: > + * ((outer relation) <join type> (inner relation) ON (joinclauses)) > + * For a base relation the function just adds the schema-qualified tablename, > + * with the appropriate alias if so requested. > + /* Don't generate bad syntax if no columns */ > > 6. Change may be good but unrelated to this patch. May be material for a > separate patch. There are few such changes in this function. While these > changes may be good by themselves, they distract reviewer from the goal of the > patch. > - appendStringInfo(buf, "("); > + appendStringInfoChar(buf, '('); OK on #4, #5, and #6, Will remove all the changes. I will leave those for separate patches. > 7. I don't understand why you need to change this function so much. Take for > example the following change. > - RelOptInfo *rel_o = fpinfo->outerrel; > - RelOptInfo *rel_i = fpinfo->innerrel; > - StringInfoData join_sql_o; > - StringInfoData join_sql_i; > + /* Begin the FROM clause entry */ > + appendStringInfoChar(buf, '('); > > /* Deparse outer relation */ > - initStringInfo(&join_sql_o); > - deparseFromExprForRel(&join_sql_o, root, rel_o, true, params_list); > + deparseRangeTblRef(buf, root, > + fpinfo->outerrel, > + fpinfo->make_outerrel_subquery, > + params_list); > /* Deparse outer relation */ > - initStringInfo(&join_sql_o); > - deparseFromExprForRel(&join_sql_o, root, rel_o, true, params_list); > + deparseRangeTblRef(buf, root, > + fpinfo->outerrel, > + fpinfo->make_outerrel_subquery, > + params_list); > > It removes few variables, instead directly accesses the members from fpinfo. > Also, deparses the individual relations in the provided buffer directly, rather > than in separate buffers in the original code. Instead of this, all you had to > do was replace a call > - deparseFromExprForRel(&join_sql_o, root, rel_o, true, params_list); > with > + deparseRangeTblRef(&join_sql_o, root, rel_o, > fpinfo->make_outerrel_subquery, params_list) > Similarly for the inner relation. Again, the changes you have done might have > been good, if done in the original code, but doing those in this patch just > creates distractions and increases the size of the patch. I did that for efficiency because deparsing the relation directly into the given buffer would save cycles, but I agree that that is another issue. Will remove that change, and leave that for another patch. > 8. Why have you changed the name of variable here? > - if (use_alias) > + if (add_rel_alias) Sorry, I forgot to remove this change from the patch. Will fix. > 9. In deparseRangeTblRef(), the targetlist for the subquery is obtained by > calling build_tlist_to_deparse(), but appendSubselectAlias() constructs the > column aliases based on foreignrel->reltarget->exprs. Those two are usually > same, but there is no code which guarantees that. Instead, pass tlist to > appendSubselectAlias() or simply pass the number of entries in that list > (list_length(tlist), which is all it needs. OR use foreignrel->reltarget->exprs > directly instead of calling build_tlist_to_deparse(). Similar problem exists > with getSubselectAliasInfo(). Actually, the patch guarantees that since in that case (1) foreignrel->reltarget->exprs doesn't contain any PHVs (note that passing the following check in foreign_join_ok implies that foreignrel->reltarget->exprs of the input rels for a given foreign join doesn't contain any PHVs and (2) we have fpinfo->local_conds == NIL, so build_tlist_to_deparse() would produce a tlist equivalent to the foreignrel->reltarget->exprs. But as you proposed, to make the code easier to understand, I will change to use foreignrel->reltarget->exprs directly. /* * deparseExplicitTargetList() isn't smart enough to handle anything other * than a Var. In particular, if there's some PlaceHolderVar that would * need to be evaluated within this join tree (because there's an upper * reference to a quantity that may goto NULL as a result of an outer * join), then we can't try to push the join down because we'll fail when * we get to deparseExplicitTargetList(). However, a PlaceHolderVar that * needs to be evaluated *at the top* of this join tree is OK, because we * can do that locally after fetching the results from the remote side. */ relids = joinrel->relids; foreach(lc, root->placeholder_list) { PlaceHolderInfo *phinfo = lfirst(lc); if (bms_is_subset(phinfo->ph_eval_at, relids) && bms_nonempty_difference(relids, phinfo->ph_eval_at)) return false; } > 10. Deparsing a query in the form of a subquery makes it hard-to-read. The > original code aimed at avoiding a subquery. Also, this change has created many > expected output changes, which again seem unnecessary. In fact, having the > pushable join clauses of an inner join in ON clause, which is closer to JOIN > clause, is better than having them farther in the WHERE clause. > - /* > - * For an inner join, all restrictions can be treated alike. Treating the > - * pushed down conditions as join conditions allows a top level full outer > - * join to be deparsed without requiring subqueries. > - */ > - if (jointype == JOIN_INNER) > - { > - Assert(!fpinfo->joinclauses); > - fpinfo->joinclauses = fpinfo->remote_conds; > - fpinfo->remote_conds = NIL; > - } I added this change for another patch that I proposed for extending the DML pushdown in 9.6 so that it can perform an update/delete on a join remotely. To create a remote query for that, I think we need to pull up inner join conditions mentioning the target relation in the JOIN/ON clauses into the WHERE clause. But I have to admit that's unrelated to this patch, so I will leave that as-is. > 11. I have reworded following comment and restructured the code that follows it > in the attached patch. > + /* > + * Set the subquery information. If the relation performs a full outer > + * join and if the input relations have non-NIL remote_conds, the input > + * relations need to be deparsed as a subquery. > + */ > The code esp. the if .. else .. block followed by another one, made it a bit > unreadable. Hence I restructured it so that it's readable and doesn't require > variable "relids" from earlier code, which was being reused. Let me know if > this change looks good. That's a good idea. Thanks! > 12. The code below deletes the condition, which is understandable. > - if (fpinfo_i->remote_conds || fpinfo_o->remote_conds) > But why does it add a new unrelated condition here? What the comment claims, > looks like an existing bug, unrelated to the patch. Can you please give an > example? If that it's an existing bug, it should be fixed as a separate patch. > I don't understand the relation of this code with what the patch is doing. > + /* > + * We can't do anything here, and if there are any non-Vars in the > + * outerrel/innerrel's reltarget, give up pushing down this join, > + * because the deparsing logic can't support such a case currently. > + */ > + if (reltarget_has_non_vars(outerrel)) > + return false; > + if (reltarget_has_non_vars(innerrel)) > return false; I thought the current deparsing logic wouldn't work well if the target list of a given subquery contained eg, system columns other than ctid and oid, but I noticed that I was wrong; it can, so we don't need this restriction. Will remove. (I don't think there is any bug.) > 13. The comment below is missing the main purpose i.e. creating a a unique > alias, in case the relation gets converted into a subquery. Lowest or highest > relid will create a unique alias at given level of join and that would be more > future proof. If we decide to consider paths for every join order, following > comment will no more be true. > + /* > + * Set the relation index. This is defined as the position of this > + * joinrel in the join_rel_list list plus the length of the rtable list. > + * Note that since this joinrel is at the end of the list when we are > + * called, we can get the position by list_length. > + */ > + fpinfo->relation_index = > + list_length(root->parse->rtable) + list_length(root->join_rel_list); I don't agree on that point. As I said before, the approach using the lowest/highest relid would make a remote query having nested subqueries difficult to read. And even if we decided to consider paths for every join order, we could define the relation_index safely, by modifying postgresGetForeignJoinPaths so that it (1) sets the relation_index the proposed way at the first time it is called and (2) avoids setting the relation_index after the first call, by checking to see fpinfo->relation_index > 0. > 14. The function name talks about non-vars but the Assert few lines below > asserts that every node there is a Var. Need better naming. > +reltarget_has_non_vars(RelOptInfo *foreignrel) > +{ > + ListCell *lc; > + > + foreach(lc, foreignrel->reltarget->exprs) > + { > + Var *var = (Var *) lfirst(lc); > + > + Assert(IsA(var, Var)); > And also an explanation for the claim > + * Note: currently deparseExplicitTargetList can't properly handle such Vars. Will remove this function for the reason described in #12. > 15. While deparsing every Var, we are descending the RelOptInfo hierarchy. Right. I think that not only avoids creating redundant data to find the alias of a given Var node but also would be able to find it in optimal cost. > Instead of that, we should probably gather all the alias information once and > store it in context as an array indexed by relid. While deparsing a Var we can > get the targetlist and alias for a given relation by using var->varno as index. > And then search for given Var node in the targetlist to deparse the column name > by its position in the targetlist. For the relations that are not converted > into subqueries, this array will not hold any information and the Var will be > deparsed as it's being done today. Sorry, I don't understand this part. How does that work when creating a remote query having nested subqueries? Is that able to search for a given Var node efficiently? > Testing > ------- > 1. The code is changing deparser but doesn't have testcases > testing impact of the code. For example, we should have testcases with remote > query deparsed as nested subqueries or nested subqueries within subqueries and > so on May be testcases where a relation deeper in the RelOptInfo hierarchy has > conditions but it's immediate upper relation does not have those. Will do. > 2. The only testcase added by this patch, copies an existing case and adds a > whole row reference to one of the relations being joined. Instead we could add > that whole-row reference to the existing testcase itself. Will do. Best regards, Etsuro Fujita
pgsql-hackers by date: