Re: Partition-wise join for join between (declaratively) partitioned tables - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Partition-wise join for join between (declaratively) partitioned tables |
Date | |
Msg-id | CAFjFpRcFpbuXhABcMH8GVeFaWCYQ2_64ENjcGEF7SZjeXgPgow@mail.gmail.com Whole thread Raw |
In response to | Re: Partition-wise join for join between (declaratively) partitioned tables (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
List | pgsql-hackers |
> > So, I am thinking about your approach of creating PartitionJoinPaths > without actually creating child paths and then at a later stage > actually plan the child joins. Here's rough sketch of how that may be > done. > > At the time of creating regular paths, we identify the join orders > which can use partition-wise join and save those in the RelOptInfo of > the parent table. If no such join order exists, we do not create > PartitionJoinPaths for that relation. Otherwise, once we have > considered all the join orders i.e. in > generate_partition_wise_join_paths(), we create one PartitionJoinPath > for every path that has survived in the parent or at least for every > path that has distinct properties like pathkeys or parameterisation, > with those properties. > > At the time of creating plans, if PartitionJoinPath is chosen, we > actually create paths for every partition of that relation > recursively. The path creation logic is carried out in a different > memory context. Amongst the paths that survive, we choose the best > path that has the same properties as PartitionJoinPath. We would > expect all parameterized paths to be retained and any unparameterized > path can be sorted to match the pathkeys of reference > PartitionJoinPath. We then create the plan out of this path and copy > it into the outer memory context and release the memory context used > for path creation. This is similar to how prepared statements save > their plans. Once we have the plan, the memory consumed by paths won't > be referenced, and hence can not create problems. At the end we create > an Append/MergeAppend plan with all the child plans and return it. > > Costing PartitionJoinPath needs more thought so that we don't end up > with bad overall plans. Here's an idea. Partition-wise joins are > better compared to the unpartitioned ones, because of the smaller > sizes of partitions. If we think of join as O(MN) operation where M > and N are sizes of unpartitioned tables being joined, partition-wise > join computes P joins each with average O(M/P * N/P) order where P is > the number of partitions, which is still O(MN) with constant factor > reduced by P times. I think, we need to apply similar logic to > costing. Let's say cost of a join is J(M, N) = S (M, N) + R (M, N) > where S and R are setup cost and joining cost (for M, N rows) resp. > Cost of partition-wise join would be P * J(M/P, N/P) = P * S(M/P, N/P) > + P * R(M/P, N/P). Each of the join methods will have different S and > R functions and may not be linear on the number of rows. So, > PartitionJoinPath costs are obtained from corresponding regular path > costs subjected to above transformation. This way, we will be > protected from choosing a PartitionJoinPath when it's not optimal. > Take example of a join where the joining relations are very small in > size, thus hash join on full relation is optimal compared to hash join > of each partition because of setup cost. In such a case, the function > which calculates the cost of hash table setup, would result in almost > same cost for full table as well as each of the partitions, thus > increasing P * S(M/P, N/P) as compared to S(M, N). > > Let me know your comments. I tried to measure the impact of having a memory context reset 1000 times (once for each partition) with the attached patch. Without this patch make check in regress/ takes about 24 seconds on my laptop and with this patch it takes 26 seconds. This is almost 10% increase in time. I hope that's fine. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
pgsql-hackers by date: