Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables |
Date | |
Msg-id | CA+TgmoZDi9fy6eZzAcMGeECCUXVxKwVFaT-dmsz9kZF+HL2p6g@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
|
List | pgsql-hackers |
On Mon, Jan 2, 2017 at 7:32 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > PFA the patch (pg_dp_join_v6.patch) with some bugs fixed and rebased > on the latest code. Maybe not surprisingly given how fast things are moving around here these days, this needs a rebase. Apart from that, my overall comment on this patch is that it's huge: 37 files changed, 7993 insertions(+), 287 deletions(-) Now, more than half of that is regression test cases and their output, which you will certainly be asked to pare down in any version of this intended for commit. But even excluding those, it's still a fairly patch: 30 files changed, 2783 insertions(+), 272 deletions(-) I think the reason this is so large is because there's a fair amount of refactoring work that has been done as a precondition of the actual meat of the patch, and no attempt has been made to separate the refactoring work from the main body of the patch. I think that's something that needs to be done. If you look at the way Amit Langote submitted the partitioning patches and the follow-up bug fixes, he had a series of patches 0001-blah, 0002-quux, etc. generated using format-patch. Each patch had its own commit message written by him explaining the purpose of that patch, links to relevant discussion, etc. If you can separate this into more digestible chunks it will be easier to get committed. Other questions/comments: Why does find_partition_scheme need to copy the partition bound information instead of just pointing to it? Amit went to some trouble to make sure that this can't change under us while we hold a lock on the relation, and we'd better hold a lock on the relation if we're planning a query against it. I think the PartitionScheme stuff should live in the optimizer rather that src/backend/catalog/partition.c. Maybe plancat.c? Perhaps we eventually need a new file in the optimizer just for partitioning stuff, but I'm not sure about that yet. The fact that set_append_rel_size needs to reopen the relation to extract a few more bits of information is not desirable. You need to fish this information through in some other way; for example, you could have get_relation_info() stash the needed bits in the RelOptInfo. + * For two partitioned tables with the same partitioning scheme, it is + * assumed that the Oids of matching partitions from both the tables + * are placed at the same position in the array of partition oids in Rather than saying that we assume this, you should say why it has to be true. (If it doesn't have to be true, we shouldn't assume it.) + * join relations. Partition tables should have same layout as the + * parent table and hence should not need any translation. But rest of The same attributes have to be present with the same types, but they can be rearranged. This comment seems to imply the contrary. FRACTION_PARTS_TO_PLAN seems like it should be a GUC. + /* + * Add this relation to the list of samples ordered by the increasing + * number of rows at appropriate place. + */ + foreach (lc, ordered_child_nos) + { + int child_no = lfirst_int(lc); + RelOptInfo *other_childrel = rel->part_rels[child_no]; + + /* + * Keep track of child with lowest number of rows but higher than the + * that of the child being inserted. Insert the child before a + * child with highest number of rows lesser than it. + */ + if (child_rel->rows <= other_childrel->rows) + insert_after = lc; + else + break; + } Can we use quicksort instead of a hand-coded insertion sort? + if (bms_num_members(outer_relids) > 1) Seems like bms_get_singleton_member could be used. + * Partitioning scheme in join relation indicates a possibilty that the Spelling. There seems to be no reason for create_partition_plan to be separated from create_plan_recurse. You can just add another case for the new path type. Why does create_partition_join_path need to be separate from create_partition_join_path_with_pathkeys? Couldn't that be combined into a single function with a pathkeys argument that might sometimes be NIL? I assume most of the logic is common. From a sort of theoretical standpoint, the biggest danger of this patch seems to be that by deferring path creation until a later stage than normal, we could miss some important processing. subquery_planner() does a lot of stuff after expand_inherited_tables(); if any of those things, especially the ones that happen AFTER path generation, have an effect on the paths, then this code needs to compensate for those changes somehow. It seems like having the planning of unsampled children get deferred until create_plan() time is awfully surprising; here we are creating the plan and suddenly what used to be a straightforward path->plan translation is running around doing major planning work. I can't entirely justify it, but I somehow have a feeling that work ought to be moved earlier. Not sure exactly where. This is not really a full review, mostly because I can't easily figure out the motivation for all of the changes the patch makes. It makes a lot of changes in a lot of places, and it's not really very easy to understand why those changes are necessary. My comments above about splitting the patch into a series of patches that can potentially be reviewed and applied independently, with the main patch being the last in the series, are a suggestion as to how to tackle that. There might be some work that needs to or could be done on the comments, too. For example, the patch splits out add_paths_to_append_rel from set_append_rel_pathlist, but the comments don't say anything helpful like "we need to do X after Y, because Z". They just say that we do it. To some extent I think the comments in the optimizer have that problem generally, so it's not entirely the fault of this patch; still, the lack of those explanations makes the code reorganization harder to follow, and might confuse future patch authors, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: