Re: BUG #18652: Planner can not find pathkey item to sort for query with expression and expression index - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #18652: Planner can not find pathkey item to sort for query with expression and expression index
Date
Msg-id 3001952.1728680364@sss.pgh.pa.us
Whole thread Raw
In response to BUG #18652: Planner can not find pathkey item to sort for query with expression and expression index  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #18652: Planner can not find pathkey item to sort for query with expression and expression index
List pgsql-bugs
Andrei Lepikhov <lepihov@gmail.com> writes:
> Why not look if some entry of the TargetList contains the var? Something 
> like attached?

I finally got my head wrapped around what is happening here.
Using a variant of Richard's example case,

SELECT b + 0 AS c FROM
  (SELECT i AS b FROM t t1 UNION ALL SELECT i + 1 FROM t t2) ss
ORDER BY c;

We initially have an EquivalenceClass containing "ss.b + 0"
to represent the required ordering pathkey.  When we flatten
the UNION ALL subselect into an appendrel, we add child EC
members containing transformed versions of that, namely
"t1.i + 0" and "(t2.i + 1) + 0".  Similarly, the relation
targetlist for ss is "ss.b" so we derive the targetlists for
the appendrel members t1 and t2 as "t1.i" and "t2.i + 1".

Then the problem is that find_computable_ec_member is trying
to verify that "(t2.i + 1) + 0" can be computed from a
subquery that currently emits only "t2.i + 1".  Its method
of pulling out just t2.i and looking for that in the subquery
tlist obviously fails.

Now, the way that the commentary for find_computable_ec_member
is written would lead you to think that what we need is to identify
that the subexpression (t2.i + 1) is available from the tlist, with
the expectation that a higher-level plan node would then compute
"subexpression + 0" from that output of a lower plan node.  That's
possible but it would require expensive search to identify things.
But that's not what actually happens in the sole use of this code by
prepare_sort_from_pathkeys.  What will actually happen is that we
will add the child EC member expression as a new member of the
same tlist, so that the plan node's tlist will look like

    t2.i + 1, (t2.i + 1) + 0

This means that it will work so long as all of the Vars needed by the
EC member expression are available from the plan node's input, which
they surely are if they are referenced in the existing tlist.  That
is, even if we wanted to compute "t2.i + 2" it'd be fine.

(This would fall down perhaps if there are volatile functions in
the sort expression, but I believe we already reject using volatile
expressions for merge append, so it's not a problem.)

So I conclude that Andrei's patch will fix it, although I don't like
the way that that requires (potentially) multiple re-executions of
pull_var_clause.  I think we should refactor the code to avoid that,
which actually ends up being less code, as in the attached draft.

I wonder whether we're doing more work here than we really need to.
If the underlying table t1 has columns i and j, and we have an EC
member that references t1.j while the tlist only mentions t1.i,
wouldn't it still work to add t1.j to the tlist?  So maybe groveling
through the tlist members is unnecessary and we only need to be
performing some kind of relation-level check on whether all the
required relations are included in the input.  But I'm hesitant
to make that kind of leap of faith in a patch that needs to be
back-patched, especially if the problem only arises in such narrow
edge cases that we've failed to detect it for 14 years.

            regards, tom lane

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 8f6f005ecb..fae137dd82 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -38,7 +38,6 @@ static EquivalenceMember *add_eq_member(EquivalenceClass *ec,
                                         JoinDomain *jdomain,
                                         EquivalenceMember *parent,
                                         Oid datatype);
-static bool is_exprlist_member(Expr *node, List *exprs);
 static void generate_base_implied_equalities_const(PlannerInfo *root,
                                                    EquivalenceClass *ec);
 static void generate_base_implied_equalities_no_const(PlannerInfo *root,
@@ -810,9 +809,18 @@ find_ec_member_matching_expr(EquivalenceClass *ec,
  *        expressions appearing in "exprs"; return NULL if no match.
  *
  * "exprs" can be either a list of bare expression trees, or a list of
- * TargetEntry nodes.  Either way, it should contain Vars and possibly
- * Aggrefs and WindowFuncs, which are matched to the corresponding elements
- * of the EquivalenceClass's expressions.
+ * TargetEntry nodes.  Typically it will contain Vars and possibly Aggrefs
+ * and WindowFuncs; however, when considering an appendrel member the list
+ * could contain arbitrary expressions.  We consider an EC member to be
+ * computable if all the Vars, PlaceHolderVars, Aggrefs, and WindowFuncs
+ * it needs are present in "exprs".
+ *
+ * There is some subtlety in that definition: for example, if an EC member is
+ * Var_A + 1 while what is in "exprs" is Var_A + 2, it's still computable.
+ * This works because in the final plan tree, the EC member's expression will
+ * be computed as part of the same plan node targetlist that is currently
+ * represented by "exprs".  So if we have Var_A available for the existing
+ * tlist member, it must be OK to use it in the EC expression too.
  *
  * Unlike find_ec_member_matching_expr, there's no special provision here
  * for binary-compatible relabeling.  This is intentional: if we have to
@@ -832,12 +840,24 @@ find_computable_ec_member(PlannerInfo *root,
                           Relids relids,
                           bool require_parallel_safe)
 {
+    List       *exprvars;
     ListCell   *lc;

+    /*
+     * Pull out the Vars and quasi-Vars present in "exprs".  In the typical
+     * non-appendrel case, this is just another representation of the same
+     * list.  However, it does remove the distinction between the case of a
+     * list of plain expressions and a list of TargetEntrys.
+     */
+    exprvars = pull_var_clause((Node *) exprs,
+                               PVC_INCLUDE_AGGREGATES |
+                               PVC_INCLUDE_WINDOWFUNCS |
+                               PVC_INCLUDE_PLACEHOLDERS);
+
     foreach(lc, ec->ec_members)
     {
         EquivalenceMember *em = (EquivalenceMember *) lfirst(lc);
-        List       *exprvars;
+        List       *emvars;
         ListCell   *lc2;

         /*
@@ -855,18 +875,18 @@ find_computable_ec_member(PlannerInfo *root,
             continue;

         /*
-         * Match if all Vars and quasi-Vars are available in "exprs".
+         * Match if all Vars and quasi-Vars are present in "exprs".
          */
-        exprvars = pull_var_clause((Node *) em->em_expr,
-                                   PVC_INCLUDE_AGGREGATES |
-                                   PVC_INCLUDE_WINDOWFUNCS |
-                                   PVC_INCLUDE_PLACEHOLDERS);
-        foreach(lc2, exprvars)
+        emvars = pull_var_clause((Node *) em->em_expr,
+                                 PVC_INCLUDE_AGGREGATES |
+                                 PVC_INCLUDE_WINDOWFUNCS |
+                                 PVC_INCLUDE_PLACEHOLDERS);
+        foreach(lc2, emvars)
         {
-            if (!is_exprlist_member(lfirst(lc2), exprs))
+            if (!list_member(exprvars, lfirst(lc2)))
                 break;
         }
-        list_free(exprvars);
+        list_free(emvars);
         if (lc2)
             continue;            /* we hit a non-available Var */

@@ -884,31 +904,6 @@ find_computable_ec_member(PlannerInfo *root,
     return NULL;
 }

-/*
- * is_exprlist_member
- *      Subroutine for find_computable_ec_member: is "node" in "exprs"?
- *
- * Per the requirements of that function, "exprs" might or might not have
- * TargetEntry superstructure.
- */
-static bool
-is_exprlist_member(Expr *node, List *exprs)
-{
-    ListCell   *lc;
-
-    foreach(lc, exprs)
-    {
-        Expr       *expr = (Expr *) lfirst(lc);
-
-        if (expr && IsA(expr, TargetEntry))
-            expr = ((TargetEntry *) expr)->expr;
-
-        if (equal(node, expr))
-            return true;
-    }
-    return false;
-}
-
 /*
  * relation_can_be_sorted_early
  *        Can this relation be sorted on this EC before the final output step?

pgsql-bugs by date:

Previous
From: Alena Rybakina
Date:
Subject: Re: BUG #18652: Planner can not find pathkey item to sort for query with expression and expression index
Next
From: Andrei Lepikhov
Date:
Subject: Re: Question of Parallel Hash Join on TPC-H Benchmark