Re: BUG #18652: Planner can not find pathkey item to sort for query with expression and expression index - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: BUG #18652: Planner can not find pathkey item to sort for query with expression and expression index |
Date | |
Msg-id | 3001952.1728680364@sss.pgh.pa.us Whole thread Raw |
In response to | BUG #18652: Planner can not find pathkey item to sort for query with expression and expression index (PG Bug reporting form <noreply@postgresql.org>) |
Responses |
Re: BUG #18652: Planner can not find pathkey item to sort for query with expression and expression index
|
List | pgsql-bugs |
Andrei Lepikhov <lepihov@gmail.com> writes: > Why not look if some entry of the TargetList contains the var? Something > like attached? I finally got my head wrapped around what is happening here. Using a variant of Richard's example case, SELECT b + 0 AS c FROM (SELECT i AS b FROM t t1 UNION ALL SELECT i + 1 FROM t t2) ss ORDER BY c; We initially have an EquivalenceClass containing "ss.b + 0" to represent the required ordering pathkey. When we flatten the UNION ALL subselect into an appendrel, we add child EC members containing transformed versions of that, namely "t1.i + 0" and "(t2.i + 1) + 0". Similarly, the relation targetlist for ss is "ss.b" so we derive the targetlists for the appendrel members t1 and t2 as "t1.i" and "t2.i + 1". Then the problem is that find_computable_ec_member is trying to verify that "(t2.i + 1) + 0" can be computed from a subquery that currently emits only "t2.i + 1". Its method of pulling out just t2.i and looking for that in the subquery tlist obviously fails. Now, the way that the commentary for find_computable_ec_member is written would lead you to think that what we need is to identify that the subexpression (t2.i + 1) is available from the tlist, with the expectation that a higher-level plan node would then compute "subexpression + 0" from that output of a lower plan node. That's possible but it would require expensive search to identify things. But that's not what actually happens in the sole use of this code by prepare_sort_from_pathkeys. What will actually happen is that we will add the child EC member expression as a new member of the same tlist, so that the plan node's tlist will look like t2.i + 1, (t2.i + 1) + 0 This means that it will work so long as all of the Vars needed by the EC member expression are available from the plan node's input, which they surely are if they are referenced in the existing tlist. That is, even if we wanted to compute "t2.i + 2" it'd be fine. (This would fall down perhaps if there are volatile functions in the sort expression, but I believe we already reject using volatile expressions for merge append, so it's not a problem.) So I conclude that Andrei's patch will fix it, although I don't like the way that that requires (potentially) multiple re-executions of pull_var_clause. I think we should refactor the code to avoid that, which actually ends up being less code, as in the attached draft. I wonder whether we're doing more work here than we really need to. If the underlying table t1 has columns i and j, and we have an EC member that references t1.j while the tlist only mentions t1.i, wouldn't it still work to add t1.j to the tlist? So maybe groveling through the tlist members is unnecessary and we only need to be performing some kind of relation-level check on whether all the required relations are included in the input. But I'm hesitant to make that kind of leap of faith in a patch that needs to be back-patched, especially if the problem only arises in such narrow edge cases that we've failed to detect it for 14 years. regards, tom lane diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 8f6f005ecb..fae137dd82 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -38,7 +38,6 @@ static EquivalenceMember *add_eq_member(EquivalenceClass *ec, JoinDomain *jdomain, EquivalenceMember *parent, Oid datatype); -static bool is_exprlist_member(Expr *node, List *exprs); static void generate_base_implied_equalities_const(PlannerInfo *root, EquivalenceClass *ec); static void generate_base_implied_equalities_no_const(PlannerInfo *root, @@ -810,9 +809,18 @@ find_ec_member_matching_expr(EquivalenceClass *ec, * expressions appearing in "exprs"; return NULL if no match. * * "exprs" can be either a list of bare expression trees, or a list of - * TargetEntry nodes. Either way, it should contain Vars and possibly - * Aggrefs and WindowFuncs, which are matched to the corresponding elements - * of the EquivalenceClass's expressions. + * TargetEntry nodes. Typically it will contain Vars and possibly Aggrefs + * and WindowFuncs; however, when considering an appendrel member the list + * could contain arbitrary expressions. We consider an EC member to be + * computable if all the Vars, PlaceHolderVars, Aggrefs, and WindowFuncs + * it needs are present in "exprs". + * + * There is some subtlety in that definition: for example, if an EC member is + * Var_A + 1 while what is in "exprs" is Var_A + 2, it's still computable. + * This works because in the final plan tree, the EC member's expression will + * be computed as part of the same plan node targetlist that is currently + * represented by "exprs". So if we have Var_A available for the existing + * tlist member, it must be OK to use it in the EC expression too. * * Unlike find_ec_member_matching_expr, there's no special provision here * for binary-compatible relabeling. This is intentional: if we have to @@ -832,12 +840,24 @@ find_computable_ec_member(PlannerInfo *root, Relids relids, bool require_parallel_safe) { + List *exprvars; ListCell *lc; + /* + * Pull out the Vars and quasi-Vars present in "exprs". In the typical + * non-appendrel case, this is just another representation of the same + * list. However, it does remove the distinction between the case of a + * list of plain expressions and a list of TargetEntrys. + */ + exprvars = pull_var_clause((Node *) exprs, + PVC_INCLUDE_AGGREGATES | + PVC_INCLUDE_WINDOWFUNCS | + PVC_INCLUDE_PLACEHOLDERS); + foreach(lc, ec->ec_members) { EquivalenceMember *em = (EquivalenceMember *) lfirst(lc); - List *exprvars; + List *emvars; ListCell *lc2; /* @@ -855,18 +875,18 @@ find_computable_ec_member(PlannerInfo *root, continue; /* - * Match if all Vars and quasi-Vars are available in "exprs". + * Match if all Vars and quasi-Vars are present in "exprs". */ - exprvars = pull_var_clause((Node *) em->em_expr, - PVC_INCLUDE_AGGREGATES | - PVC_INCLUDE_WINDOWFUNCS | - PVC_INCLUDE_PLACEHOLDERS); - foreach(lc2, exprvars) + emvars = pull_var_clause((Node *) em->em_expr, + PVC_INCLUDE_AGGREGATES | + PVC_INCLUDE_WINDOWFUNCS | + PVC_INCLUDE_PLACEHOLDERS); + foreach(lc2, emvars) { - if (!is_exprlist_member(lfirst(lc2), exprs)) + if (!list_member(exprvars, lfirst(lc2))) break; } - list_free(exprvars); + list_free(emvars); if (lc2) continue; /* we hit a non-available Var */ @@ -884,31 +904,6 @@ find_computable_ec_member(PlannerInfo *root, return NULL; } -/* - * is_exprlist_member - * Subroutine for find_computable_ec_member: is "node" in "exprs"? - * - * Per the requirements of that function, "exprs" might or might not have - * TargetEntry superstructure. - */ -static bool -is_exprlist_member(Expr *node, List *exprs) -{ - ListCell *lc; - - foreach(lc, exprs) - { - Expr *expr = (Expr *) lfirst(lc); - - if (expr && IsA(expr, TargetEntry)) - expr = ((TargetEntry *) expr)->expr; - - if (equal(node, expr)) - return true; - } - return false; -} - /* * relation_can_be_sorted_early * Can this relation be sorted on this EC before the final output step?
pgsql-bugs by date: