Home > mailing lists

Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers

From	Amit Langote
Subject	Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Date	April 20, 2017 13:05:14
Msg-id	974e0c9c-c545-9acb-707c-66de9cfc904e@lab.ntt.co.jp Whole thread Raw
In response to	Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses	Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
List	pgsql-hackers

Tree view

On 2017/04/20 15:45, Ashutosh Bapat wrote:
> On Thu, Apr 20, 2017 at 10:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I don't understand why you think that partition-wise join needs any
>> new logic here; if this were a non-partitionwise join, we'd similarly
>> need to use the correct operator, but the existing code handles that
>> just fine.  If the join is performed partition-wise, it should use the
>> same operators that would have been used by a non-partitionwise join
>> between the same tables.
>>
>> I think the choice of operator depends only on the column types, and
>> that the "width" of those types has nothing to do with it.  For
>> example, if the user writes .WHERE A.x = B.x AND B.x = C.x, the
>> operator for an A/B join or a B/C join will be the one that appears in
>> the query; parse analysis will have identified which specific operator
>> is meant based on the types of the columns.  If the optimizer
>> subsequently decides to reorder the joins and perform the A/C join
>> first, it will go hunt down the operator with the same strategy number
>> in the same operator family that takes the type of A.x on one side and
>> the type of C.x on the other side.  No problem.  A partition-wise join
>> between A and C will use that same operator; again, no problem.
>>
>> Your example involves joining the output of a join between i4 and i8
>> against i2, so it seems there is some ambiguity about what the input
>> type should be.  But, again, the planner already copes with this
>> problem.  In fact, the join is performed either using i4.x or i8.x --
>> I don't know what happens, or whether it depends on other details of
>> the query or the plan -- and the operator which can accept that value
>> on one side and i2.x on the other side is the one that gets used.
> 
> I think you are confusing join condition application and partition
> bounds of a join relation. What you have described above is how
> operators are chosen to apply join conditions - it picks up the
> correct operator from the operator family based on the column types
> being used in join condition. That it can do because the columns being
> joined are both present the relations being joined, irrespective of
> which pair of relations is being joined. In your example, A.x, B.x and
> C.x are all present on one of the sides of join irrespective of
> whether the join is executed as (AB)C, A(BC) or (AC)B.
> 
> But the problem we are trying to solve here about partition bounds of
> the join relation: what should be the partition bounds of AB, BC or
> AC? When we compare partition bounds of and intermediate join with
> other intermediate join (e.g. AB with those of C) what operator should
> be used? You seem to be suggesting that we keep as many sets of
> partition bounds as there are base relations participating in the join
> and then use appropriate partition bounds based on the columns in the
> join conditions, so that we can use the same operator as used in the
> join condition. That doesn't seem to be a good option since the
> partition bounds will all have same values, only differing in their
> binary representation because of differences in data types. I am of
> the opinion that we save a single set of partition bounds. We have to
> then associate a data type with bounds to know binary representation
> of partition bound datums. That datatype would be one of the partition
> key types of joining relations. I may be wrong in using term "wider"
> since its associated with the length of binary reprentation. But we
> need some logic to coalesce the two data types based on the type of
> join and key type on the outer side.

FWIW, I think that using any one of the partition bounds of the baserels
being partitionwise-joined should suffice as the partition bound of any
combination of joins involving two or more of those baserels, as long as
the partitioning operator of each of the baserels is in the same operator
family (I guess that *is* checked somewhere in the partitionwise join
consideration flow).  IOW, partopfamily[] of all of the baserels should
match and then the join clause operators involved should belong to the
same respective operator families.

ISTM, the question here is about how to derive the partitioning properties
of joinrels from those of the baserels involved.  Even if the join
conditions refer to columns of different types on two sides, as long as
the partitioning and joining is known to occur using operators of
compatible semantics, I don't understand what more needs to be considered
or done.  Although, I haven't studied things in enough detail to say
anything confidently about whether join being INNER or OUTER has any
bearing on the semantics of the partitioning of the joinrels in question.
IIUC, using partitioning properties to apply partitionwise join technique
at successive join levels will be affected by the OUTER considerations
similar to how they affect what levels a give EquivalenceClass clause
could be applied without causing any semantics violations.  As already
mentioned upthread, it would be a good idea to have some integration of
the partitioning considerations with the equivalence class mechanism (how
ForeignKeyOptInfo contains links to ECs comes to mind).

By the way, looking at match_expr_to_partition_keys() in your latest
patch, I wonder why not use an approach similar to calling
is_indexable_operator() that is used in match_clause_to_indexcol()?  Note
that is_indexable_operator() simply checks if clause->opno is in the index
key's operator family, as returned by op_in_opfamily().  Instead I see the
following:
       /*        * The clause allows partition-wise join if only it uses the same        * operator family as that
specifiedby the partition key.        */       if (!list_member_oid(rinfo->mergeopfamilies,
part_scheme->partopfamily[ipk1]))          continue;
 

But maybe I am missing something.

Thanks,
Amit

pgsql-hackers by date:

From: Rafia Sabih
Date: 20 April 2017, 12:05:36
Subject: Re: [HACKERS] Range Merge Join v1

From: Ashutosh Bapat
Date: 20 April 2017, 13:16:50
Subject: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables

Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers

Previous

Next