Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables |
Date | |
Msg-id | CAFjFpRcsVW8HcsXXkz+_ZFwEYRb7XwrXV63T4cuyyTgxO8-xEQ@mail.gmail.com Whole thread Raw |
In response to | Partition-wise join for join between (declaratively) partitioned tables (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
|
List | pgsql-hackers |
On Wed, Apr 5, 2017 at 8:39 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Apr 4, 2017 at 10:22 AM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> Yes, I agree. For an inner join, the partition key types need to "shrink" >> and for outer join they need to be "widened". I don't know if there is a way >> to know "wider" or "shorter" of two given types. We might have to implement >> a method to merge partition keys to produce partition key of the join, which >> may be different from either of the partition keys. So, after-all we may >> have to abandon the idea of canonical partition scheme. I haven't included >> this change in the attached set of patches. > > I think this is why you need to regard the partitioning scheme as > something more like an equivalence class - possibly the partitioning > scheme should actually contain (or be?) an equivalence class. Suppose > this is the query: > > SELECT * FROM i4 INNER JOIN i8 ON i4.x = i8.x; > > ...where i4 (x) is an int4 partitioning key and i8 (x) is an int8 > partitioning key. It's meaningless to ask whether the result of the > join is partitioned by int4 or int8. It's partitioned by the > equivalence class that contains both i4.x and i8.x. If the result of > this join where joined to another table on either of those two > columns, a second partition-wise join would be theoretically possible. > If you insist on knowing the type of the partitioning scheme, rather > than just the opfamily, you've boxed yourself into a corner from which > there's no good escape. When we merge partition bounds from two relations with different partition key types, the merged partition bounds need to have some information abound the way those constants look like e.g. their length, structure etc. That's the reason we need to store partition key types of merged partitioning scheme. Consider a three way join (i4 JOIN i8 ON i4.x = i8.x) JOIN i2 ON (i2.x = i.x). When we compare partition bounds of i4 and i8, we use operators for int4 and int8. The join i4 JOIN i8 will get partition bounds by merging those of i4 and i8. When we come to join with i2, we need to know which operators to use for comparing the partition bounds of the join with those of i2. So, if the partition key types of the joining relations differ (but they have matching partitioning schemes per strategy, natts and operator family) the partition bounds of the join are converted to the wider type among the partition key types of the joining tree. Actually, as I am explained earlier we could choose a wider outer type for an OUTER join and shorter type for inner join. This type is used as partition key type of the join. In the above case join between i4 and i8 have its partition bounds converted to i8 (or i4) and then when it is joined with i2 the partition bounds of the join are converted to i8 (or i2). -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: