Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables |
Date | |
Msg-id | 974e0c9c-c545-9acb-707c-66de9cfc904e@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
|
List | pgsql-hackers |
On 2017/04/20 15:45, Ashutosh Bapat wrote: > On Thu, Apr 20, 2017 at 10:42 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> I don't understand why you think that partition-wise join needs any >> new logic here; if this were a non-partitionwise join, we'd similarly >> need to use the correct operator, but the existing code handles that >> just fine. If the join is performed partition-wise, it should use the >> same operators that would have been used by a non-partitionwise join >> between the same tables. >> >> I think the choice of operator depends only on the column types, and >> that the "width" of those types has nothing to do with it. For >> example, if the user writes .WHERE A.x = B.x AND B.x = C.x, the >> operator for an A/B join or a B/C join will be the one that appears in >> the query; parse analysis will have identified which specific operator >> is meant based on the types of the columns. If the optimizer >> subsequently decides to reorder the joins and perform the A/C join >> first, it will go hunt down the operator with the same strategy number >> in the same operator family that takes the type of A.x on one side and >> the type of C.x on the other side. No problem. A partition-wise join >> between A and C will use that same operator; again, no problem. >> >> Your example involves joining the output of a join between i4 and i8 >> against i2, so it seems there is some ambiguity about what the input >> type should be. But, again, the planner already copes with this >> problem. In fact, the join is performed either using i4.x or i8.x -- >> I don't know what happens, or whether it depends on other details of >> the query or the plan -- and the operator which can accept that value >> on one side and i2.x on the other side is the one that gets used. > > I think you are confusing join condition application and partition > bounds of a join relation. What you have described above is how > operators are chosen to apply join conditions - it picks up the > correct operator from the operator family based on the column types > being used in join condition. That it can do because the columns being > joined are both present the relations being joined, irrespective of > which pair of relations is being joined. In your example, A.x, B.x and > C.x are all present on one of the sides of join irrespective of > whether the join is executed as (AB)C, A(BC) or (AC)B. > > But the problem we are trying to solve here about partition bounds of > the join relation: what should be the partition bounds of AB, BC or > AC? When we compare partition bounds of and intermediate join with > other intermediate join (e.g. AB with those of C) what operator should > be used? You seem to be suggesting that we keep as many sets of > partition bounds as there are base relations participating in the join > and then use appropriate partition bounds based on the columns in the > join conditions, so that we can use the same operator as used in the > join condition. That doesn't seem to be a good option since the > partition bounds will all have same values, only differing in their > binary representation because of differences in data types. I am of > the opinion that we save a single set of partition bounds. We have to > then associate a data type with bounds to know binary representation > of partition bound datums. That datatype would be one of the partition > key types of joining relations. I may be wrong in using term "wider" > since its associated with the length of binary reprentation. But we > need some logic to coalesce the two data types based on the type of > join and key type on the outer side. FWIW, I think that using any one of the partition bounds of the baserels being partitionwise-joined should suffice as the partition bound of any combination of joins involving two or more of those baserels, as long as the partitioning operator of each of the baserels is in the same operator family (I guess that *is* checked somewhere in the partitionwise join consideration flow). IOW, partopfamily[] of all of the baserels should match and then the join clause operators involved should belong to the same respective operator families. ISTM, the question here is about how to derive the partitioning properties of joinrels from those of the baserels involved. Even if the join conditions refer to columns of different types on two sides, as long as the partitioning and joining is known to occur using operators of compatible semantics, I don't understand what more needs to be considered or done. Although, I haven't studied things in enough detail to say anything confidently about whether join being INNER or OUTER has any bearing on the semantics of the partitioning of the joinrels in question. IIUC, using partitioning properties to apply partitionwise join technique at successive join levels will be affected by the OUTER considerations similar to how they affect what levels a give EquivalenceClass clause could be applied without causing any semantics violations. As already mentioned upthread, it would be a good idea to have some integration of the partitioning considerations with the equivalence class mechanism (how ForeignKeyOptInfo contains links to ECs comes to mind). By the way, looking at match_expr_to_partition_keys() in your latest patch, I wonder why not use an approach similar to calling is_indexable_operator() that is used in match_clause_to_indexcol()? Note that is_indexable_operator() simply checks if clause->opno is in the index key's operator family, as returned by op_in_opfamily(). Instead I see the following: /* * The clause allows partition-wise join if only it uses the same * operator family as that specifiedby the partition key. */ if (!list_member_oid(rinfo->mergeopfamilies, part_scheme->partopfamily[ipk1])) continue; But maybe I am missing something. Thanks, Amit
pgsql-hackers by date: