Re: Push down more full joins in postgres_fdw - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Push down more full joins in postgres_fdw |
Date | |
Msg-id | CAFjFpRfwoSsJr9418b2jA7g0nwagjZSWhPQPUmK9M6z5XSOAqQ@mail.gmail.com Whole thread Raw |
In response to | Re: Push down more full joins in postgres_fdw (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>) |
Responses |
Re: Push down more full joins in postgres_fdw
Re: Push down more full joins in postgres_fdw |
List | pgsql-hackers |
Review for postgres-fdw-more-full-join-pushdown-v6 patch. The patch applies cleanly and regression is clean (make check in regress directory and that in postgres_fdw). Here are some comments. 1. Because of the following code change, for a joinrel we might end up using attrs_used, which will be garbage for a join rel. The assumption that tlist can not be NIL for a join relation is wrong. Even for a join relation, tlist can be NULL. Consider query 'explain verbose select count(*) from ft1, ft2' Since count(*) doesn't need any columns, the tlist is NIL. With the patch the server crashes for this query. - if (foreignrel->reloptkind == RELOPT_JOINREL) + /* + * Note: tlist for a base relation might be non-NIL. For example, if the + * base relation is an operand of a foreign join performing a full outer + * join and has non-NIL remote_conds, the base relation will be deparsed + * as a subquery, so the tlist for the base relation could be non-NIL. + */ + if (tlist != NIL) { - /* For a join relation use the input tlist */ We can not decide whether to use deparseExplicitTargetList or deparse it from attrs_used based on the tlist. SELECT lists, whether it's topmost SELECT or a subquery (even on a base relation), for deparsing a JOIN query need to use deparseExplicitTargetList. 2. The code in deparseRangeTblRef() dealing with subqueries, is very similar to deparseSelectStmtForRel(). The only thing deparseRangeTblRef() does not require is the appending ORDER BY, which is determined by existence of pathkeys argument. Probably we should reuse deparseSelectStmtForRel(), instead of duplicating the code. This will also make the current changes to deparseSelectSql unnecessary. 3. Why do we need following change? The elog() just few lines above says that we expect only Var nodes. Why then we use deparseExpr() instead of deparseVar()? if (i > 0) appendStringInfoString(buf, ", "); - deparseVar(var, context); + deparseExpr((Expr *) var, context); *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1); And I get the answer for that question, somewhere down in the patch + /* + * If the given expression is an output column of a subquery-in-FROM, + * deparse the alias to the expression instead. + */ + if (IsA(node, Var)) + { + int tabno; + int colno; + + if (isSubqueryExpr(node, context->foreignrel, &tabno, &colno)) + { + appendStringInfo(context->buf, "%s%d.%s%d", + SS_TAB_ALIAS_PREFIX, tabno, + SS_COL_ALIAS_PREFIX, colno); + return; + } + } + Functionally, this code belongs to deparseVar() and not deparseExpr(). Like all other functions which handle Expr nodes, deparseExpr() is expected to be a dispatcher to the node specific function. 4. This comment is good to have there, but is unrelated to this patch. May be a separate patch? + /* Don't generate bad syntax if no columns */ 5. Changed comment doesn't say anything different from the original comment. It may have been good to have it this way to start with, but changing it now doesn't bring anything new. You seem to have just merged prologue of the function with a comment in that function. I think, this change is unnecessary. - * The function constructs ... JOIN ... ON ... for join relation. For a base - * relation it just returns schema-qualified tablename, with the appropriate - * alias if so requested. + * For a join relation the clause of the following form is appended to buf: + * ((outer relation) <join type> (inner relation) ON (joinclauses)) + * For a base relation the function just adds the schema-qualified tablename, + * with the appropriate alias if so requested. + /* Don't generate bad syntax if no columns */ 6. Change may be good but unrelated to this patch. May be material for a separate patch. There are few such changes in this function. While these changes may be good by themselves, they distract reviewer from the goal of the patch. - appendStringInfo(buf, "("); + appendStringInfoChar(buf, '('); 7. I don't understand why you need to change this function so much. Take for example the following change. - RelOptInfo *rel_o = fpinfo->outerrel; - RelOptInfo *rel_i = fpinfo->innerrel; - StringInfoData join_sql_o; - StringInfoData join_sql_i; + /* Begin the FROM clause entry */ + appendStringInfoChar(buf, '('); /* Deparse outer relation */ - initStringInfo(&join_sql_o); - deparseFromExprForRel(&join_sql_o, root, rel_o, true, params_list); + deparseRangeTblRef(buf, root, + fpinfo->outerrel, + fpinfo->make_outerrel_subquery, + params_list); /* Deparse outer relation */ - initStringInfo(&join_sql_o); - deparseFromExprForRel(&join_sql_o, root, rel_o, true, params_list); + deparseRangeTblRef(buf, root, + fpinfo->outerrel, + fpinfo->make_outerrel_subquery, + params_list); It removes few variables, instead directly accesses the members from fpinfo. Also, deparses the individual relations in the provided buffer directly, rather than in separate buffers in the original code. Instead of this, all you had to do was replace a call - deparseFromExprForRel(&join_sql_o, root, rel_o, true, params_list); with + deparseRangeTblRef(&join_sql_o, root, rel_o, fpinfo->make_outerrel_subquery, params_list) Similarly for the inner relation. Again, the changes you have done might have been good, if done in the original code, but doing those in this patch just creates distractions and increases the size of the patch. Similarly changes - /* Append join clause; (TRUE) if no join clause */ + /* Append join conditions */ + { + /* No join conditions; add "(TRUE)" */ appendStringInfoString(buf, "(TRUE)"); + } + appendStringInfoString(buf, " ON "); - /* End the FROM clause entry. */ - appendStringInfo(buf, ")"); + /* End the FROM clause entry */ + appendStringInfoChar(buf, ')'); 8. Why have you changed the name of variable here? - if (use_alias) + if (add_rel_alias) 9. In deparseRangeTblRef(), the targetlist for the subquery is obtained by calling build_tlist_to_deparse(), but appendSubselectAlias() constructs the column aliases based on foreignrel->reltarget->exprs. Those two are usually same, but there is no code which guarantees that. Instead, pass tlist to appendSubselectAlias() or simply pass the number of entries in that list (list_length(tlist), which is all it needs. OR use foreignrel->reltarget->exprs directly instead of calling build_tlist_to_deparse(). Similar problem exists with getSubselectAliasInfo(). 10. Deparsing a query in the form of a subquery makes it hard-to-read. The original code aimed at avoiding a subquery. Also, this change has created many expected output changes, which again seem unnecessary. In fact, having the pushable join clauses of an inner join in ON clause, which is closer to JOIN clause, is better than having them farther in the WHERE clause. - /* - * For an inner join, all restrictions can be treated alike. Treating the - * pushed down conditions as join conditions allows a top level full outer - * join to be deparsed without requiring subqueries. - */ - if (jointype == JOIN_INNER) - { - Assert(!fpinfo->joinclauses); - fpinfo->joinclauses = fpinfo->remote_conds; - fpinfo->remote_conds = NIL; - } - 11. I have reworded following comment and restructured the code that follows it in the attached patch. + /* + * Set the subquery information. If the relation performs a full outer + * join and if the input relations have non-NIL remote_conds, the input + * relations need to be deparsed as a subquery. + */ The code esp. the if .. else .. block followed by another one, made it a bit unreadable. Hence I restructured it so that it's readable and doesn't require variable "relids" from earlier code, which was being reused. Let me know if this change looks good. 12. The code below deletes the condition, which is understandable. - if (fpinfo_i->remote_conds || fpinfo_o->remote_conds) But why does it add a new unrelated condition here? What the comment claims, looks like an existing bug, unrelated to the patch. Can you please give an example? If that it's an existing bug, it should be fixed as a separate patch. I don't understand the relation of this code with what the patch is doing. + /* + * We can't do anything here, and if there are any non-Vars in the + * outerrel/innerrel's reltarget, give up pushing down this join, + * because the deparsing logic can't support such a case currently. + */ + if (reltarget_has_non_vars(outerrel)) + return false; + if (reltarget_has_non_vars(innerrel)) return false; 13. The comment below is missing the main purpose i.e. creating a a unique alias, in case the relation gets converted into a subquery. Lowest or highest relid will create a unique alias at given level of join and that would be more future proof. If we decide to consider paths for every join order, following comment will no more be true. + /* + * Set the relation index. This is defined as the position of this + * joinrel in the join_rel_list list plus the length of the rtable list. + * Note that since this joinrel is at the end of the list when we are + * called, we can get the position by list_length. + */ + fpinfo->relation_index = + list_length(root->parse->rtable) + list_length(root->join_rel_list); 14. The function name talks about non-vars but the Assert few lines below asserts that every node there is a Var. Need better naming. +reltarget_has_non_vars(RelOptInfo *foreignrel) +{ + ListCell *lc; + + foreach(lc, foreignrel->reltarget->exprs) + { + Var *var = (Var *) lfirst(lc); + + Assert(IsA(var, Var)); And also an explanation for the claim + * Note: currently deparseExplicitTargetList can't properly handle such Vars. 15. While deparsing every Var, we are descending the RelOptInfo hierarchy. Instead of that, we should probably gather all the alias information once and store it in context as an array indexed by relid. While deparsing a Var we can get the targetlist and alias for a given relation by using var->varno as index. And then search for given Var node in the targetlist to deparse the column name by its position in the targetlist. For the relations that are not converted into subqueries, this array will not hold any information and the Var will be deparsed as it's being done today. Testing ------- 1. The code is changing deparser but doesn't have testcases testing impact of the code. For example, we should have testcases with remote query deparsed as nested subqueries or nested subqueries within subqueries and so on May be testcases where a relation deeper in the RelOptInfo hierarchy has conditions but it's immediate upper relation does not have those. 2. The only testcase added by this patch, copies an existing case and adds a whole row reference to one of the relations being joined. Instead we could add that whole-row reference to the existing testcase itself. On Thu, Oct 13, 2016 at 4:05 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > On 2016/09/29 21:12, Etsuro Fujita wrote: >> >> Please find attached an updated version of the patch. > > > Here is a new version (V6) of the patch, which is basically the same as the > previous patch, but I slightly modified that patch. Another patch I am > attaching is an updated patch for pushing down PHVs to the remote, which is > created on top of the V6 patch. > > Best regards, > Etsuro Fujita -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: