Thread: Partition pruning on parameters grouped into an array does not prune properly

Hi,

Working on improving partition pruning [1] I found a case that I may 
treat like a bug. I mean the case with expression on a partitioning 
column like:

id = ARRAY[$1,$2]

Basically, pruning on ARRAY[$1,$2] works in the case of single column 
(see correct-pruning-example.sql in attachment):

PREPARE test (int, int) AS
   SELECT * FROM array_prune WHERE id = ANY(ARRAY[$1,$2]);
EXPLAIN (COSTS OFF) EXECUTE test(1,2);

  Append
    Subplans Removed: 1
    ->  Seq Scan on array_prune_t0 array_prune_1
          Filter: (id = ANY (ARRAY[$1, $2]))

But if we partition on HASH(x,y) it is not working (see 
incorrect-pruning-example.sql):

PREPARE test2 (int,int) AS
  SELECT 1 FROM array_prune
  WHERE id1 = ANY(ARRAY[$1]) AND id2 = ANY(ARRAY[$2]);
EXPLAIN (COSTS OFF) EXECUTE test2(1,-1);

  Append
    ->  Seq Scan on array_prune_t0 array_prune_1
          Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2])))
    ->  Seq Scan on array_prune_t1 array_prune_2
          Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2])))

Although its analogue works nice:

PREPARE test3 (int,int) AS
  SELECT 1 FROM array_prune
  WHERE id1 = $1 AND id2 = $2;
EXPLAIN (COSTS OFF) EXECUTE test3(1,-1);

  Append
    Subplans Removed: 1
    ->  Seq Scan on array_prune_t0 array_prune_1
          Filter: ((id1 = $1) AND (id2 = $2))

So, before diving into the partitioning depths, someone may quickly say 
it is not a bug, but I am missing something. Some hidden semantics?


[1] Prune partitions by ScalarArrayOpExpr with an array parameter 
(partkey = ANY($1))
https://www.postgresql.org/message-id/b8cdd20f-b34b-42b9-8c7c-dae864b7b3b2@gmail.com

-- 
regards, Andrei Lepikhov

Attachment
On Thu, 27 Mar 2025 at 04:19, Andrei Lepikhov <lepihov@gmail.com> wrote:
> But if we partition on HASH(x,y) it is not working (see
> incorrect-pruning-example.sql):
>
> PREPARE test2 (int,int) AS
>   SELECT 1 FROM array_prune
>   WHERE id1 = ANY(ARRAY[$1]) AND id2 = ANY(ARRAY[$2]);
> EXPLAIN (COSTS OFF) EXECUTE test2(1,-1);
>
>   Append
>     ->  Seq Scan on array_prune_t0 array_prune_1
>           Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2])))
>     ->  Seq Scan on array_prune_t1 array_prune_2
>           Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2])))

It is a bug.  This is down to how match_clause_to_partition_key()
handles ScalarArrayOpExpr.  To save some complexity in the handling of
ScalarArrayOpExpr, these get transformed into OpExprs, one for each
item in the ScalarArrayOpExpr.  Look for the call to make_opclause()
in match_clause_to_partition_key(). Just a few lines down, you see
that we recursively call gen_partprune_steps_internal() to pass down
the OpExprs that we just generated.  The problem is that the recursive
call only contains the OpExprs generated for one of the
ScalarArrayOpExpr, gen_prune_steps_from_opexps() requires equality
quals (or at least an key IS NULL qual) for all partitioned keys for
hash partitioning, otherwise it'll bail out on the following:

if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
clauselist == NIL && !bms_is_member(i, nullkeys))
    return NIL;

I wonder if we need to redesign this to not do that recursive
processing and instead have it so match_clause_to_partition_key() can
generate multiple PartClauseInfos. If we've matched to the
ScalarArrayOpExpr then I think each generated PartClauseInfo should
have the same PartClauseMatchStatus. That would also get rid of the
(kinda silly) overhead we have of having to match the
ScalarArrayOpExpr to the partition key, then generating OpExprs and
having to match those again, even though we know they will match.

I suspect the fix for this might be a bit invasive to backpatch. Maybe
it's something we can give a bit more clear thought to after the
freeze is over.

David



On 3/27/25 01:58, David Rowley wrote:
> I wonder if we need to redesign this to not do that recursive
> processing and instead have it so match_clause_to_partition_key() can
> generate multiple PartClauseInfos. If we've matched to the
> ScalarArrayOpExpr then I think each generated PartClauseInfo should
> have the same PartClauseMatchStatus. That would also get rid of the
> (kinda silly) overhead we have of having to match the
> ScalarArrayOpExpr to the partition key, then generating OpExprs and
> having to match those again, even though we know they will match.
> 
> I suspect the fix for this might be a bit invasive to backpatch. Maybe
> it's something we can give a bit more clear thought to after the
> freeze is over.
Thank you for the explanation!

Why does the pruning machinery only include the OpExpr pruning 
operation? Often, when preparing for pruning steps, we don’t know the 
exact number of values we will encounter during the initial or execution 
pruning stages. I believe it would be beneficial to have an iterator - 
something similar to the predicate_implied_by function - that can 
iteratively extract values from an array. This would allow pruning in 
practical scenarios, such as the following:

CREATE OR REPLACE FUNCTION some_business_logic(val integer)
RETURNS integer[] AS $$
BEGIN
  IF txid_current() % 2 = 0 THEN
    RETURN ARRAY[val];
  ELSE
    RETURN ARRAY[val + 1];
  END IF;
END;
$$ LANGUAGE plpgsql STRICT STABLE;

PREPARE test (int) AS
   SELECT * FROM array_prune
   WHERE id = ANY (some_business_logic($1));
EXPLAIN (ANALYZE, COSTS OFF) EXECUTE test(1);

Also in that case we wouldn't need to decompose a ScalarArrayOpExpr to 
the list of OpExpr clauses to prune partitions.
-- 
regards, Andrei Lepikhov



On 3/27/25 01:58, David Rowley wrote:
> I suspect the fix for this might be a bit invasive to backpatch. Maybe
> it's something we can give a bit more clear thought to after the
> freeze is over.
One more issue I think may be addressed (or just considered) here is the 
following:

CREATE TABLE parted (a int, b int) PARTITION BY RANGE (a, b);

CREATE TABLE part1 PARTITION OF parted
   FOR VALUES FROM (1, 1) TO (1, 10);
CREATE TABLE part2 PARTITION OF parted
   FOR VALUES FROM (2, 1) TO (2, 10);

INSERT INTO parted (VALUES (1, 2));
INSERT INTO parted VALUES (2, 2);

EXPLAIN (COSTS OFF)
SELECT * FROM parted WHERE a > 1 AND b < 1;
EXPLAIN (COSTS OFF)
SELECT * FROM parted WHERE a > 1 AND b > 10;

/*
  Seq Scan on part2 parted
    Filter: ((a > 1) AND (b < 1))
  Seq Scan on part2 parted
    Filter: ((a > 1) AND (b > 10))
*/

I think partition part2 could be pruned like in the following example:

EXPLAIN (COSTS OFF)
SELECT * FROM parted WHERE a > 2 AND b > 10;

/*
  Result
    One-Time Filter: false
*/

-- 
regards, Andrei Lepikhov



On Thu, Mar 27, 2025 at 9:59 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Thu, 27 Mar 2025 at 04:19, Andrei Lepikhov <lepihov@gmail.com> wrote:
> > But if we partition on HASH(x,y) it is not working (see
> > incorrect-pruning-example.sql):
> >
> > PREPARE test2 (int,int) AS
> >   SELECT 1 FROM array_prune
> >   WHERE id1 = ANY(ARRAY[$1]) AND id2 = ANY(ARRAY[$2]);
> > EXPLAIN (COSTS OFF) EXECUTE test2(1,-1);
> >
> >   Append
> >     ->  Seq Scan on array_prune_t0 array_prune_1
> >           Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2])))
> >     ->  Seq Scan on array_prune_t1 array_prune_2
> >           Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2])))
>
> It is a bug.  This is down to how match_clause_to_partition_key()
> handles ScalarArrayOpExpr.  To save some complexity in the handling of
> ScalarArrayOpExpr, these get transformed into OpExprs, one for each
> item in the ScalarArrayOpExpr.  Look for the call to make_opclause()
> in match_clause_to_partition_key(). Just a few lines down, you see
> that we recursively call gen_partprune_steps_internal() to pass down
> the OpExprs that we just generated.  The problem is that the recursive
> call only contains the OpExprs generated for one of the
> ScalarArrayOpExpr, gen_prune_steps_from_opexps() requires equality
> quals (or at least an key IS NULL qual) for all partitioned keys for
> hash partitioning, otherwise it'll bail out on the following:
>
> if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
> clauselist == NIL && !bms_is_member(i, nullkeys))
>     return NIL;
>
> I wonder if we need to redesign this to not do that recursive
> processing and instead have it so match_clause_to_partition_key() can
> generate multiple PartClauseInfos. If we've matched to the
> ScalarArrayOpExpr then I think each generated PartClauseInfo should
> have the same PartClauseMatchStatus. That would also get rid of the
> (kinda silly) overhead we have of having to match the
> ScalarArrayOpExpr to the partition key, then generating OpExprs and
> having to match those again, even though we know they will match.

Hmm, how would the multiple PartClauseInfos that would be generated in
this case -- presumably one per element in the ScalarArrayOpExpr -- be
interpreted by gen_partprune_steps_internal()? As things stand, that
code assumes multiple clauses for a given key are to be combined using
logical AND, which doesn’t line up with how we handle
ScalarArrayOpExpr with ANY. In that case (useOr = true), we expand the
clause into individual OpExprs for each element, wrap them in an
OR_EXPR, and pass that down recursively to
gen_partprune_steps_internal(), which we know works correctly for
single partition key case.  So unless we also rethink how pruning step
generation in the caller (gen_partprune_steps_internal()) interprets
multiple PartClauseInfos for the same key, just returning more of them
from match_clause_to_partition_key() wouldn’t be enough. We’d need to
restructure the caller to combine them in a way that reflects the
intended cross-product of values across partition keys.

For example, consider:

key1 = ANY(ARRAY[1, 2]) AND key2 = ANY(ARRAY[3, 4])

which expands to:

(key1 = 1 OR key1 = 2) AND (key2 = 3 OR key2 = 4)

and then to DNF:

(key1 = 1 AND key2 = 3)
OR (key1 = 1 AND key2 = 4)
OR (key1 = 2 AND key2 = 3)
OR (key1 = 2 AND key2 = 4)

To support pruning in this case, we’d need to extend
gen_partprune_steps_internal() to construct value combinations across
independently matched keys. The way that could work is for each (key1,
key2) pair in the cross-product, we’d generate a PartitionPruneStepOp,
and combine their results using an OR-mode PartitionPruneStepCombine
step.

As for whether this qualifies as a bug, I’d lean toward “no” -- at
least in the sense that it doesn’t produce incorrect results. We’ve
historically treated unsupported pruning cases like this as missed
optimization opportunities rather than planner correctness issues.
That would argue against backpatching a fix, even if we do decide to
improve the pruning logic in this area and for other cases going
forward.

--
Thanks, Amit Langote