Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled. - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled. |
Date | |
Msg-id | 5B3CB833.4040005@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled. (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
|
List | pgsql-hackers |
(2018/07/04 19:04), Ashutosh Bapat wrote: > On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> (2018/06/22 22:54), Ashutosh Bapat wrote: >>> + if (enable_partitionwise_join&& rel->top_parent_is_partitioned) >>> + { >>> + build_childrel_tlist(root, rel, childrel, 1,&appinfo); >>> + } >>> >>> Why do we need rel->top_parent_is_partitioned? If a relation is >>> partitioned (if (rel->part_scheme), it's either the top parent or is >>> partition of some other partitioned table. In either case this >>> condition will be true. >> >> >> This would be needed to avoid unnecessarily applying build_childrel_tlist to >> child rels of a partitioned table for which we don't consider partitionwise >> join. Consider: >> >> postgres=# create table lpt (c1 int, c2 text) partition by list (c1); >> CREATE TABLE >> postgres=# create table lpt_p1 partition of lpt for values in (1); >> CREATE TABLE >> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text); >> CREATE TABLE >> postgres=# create table test (c1 int, c2 text); >> CREATE TABLE >> postgres=# explain verbose select * from (select * from lpt union all select >> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1); > > --- plan clipped > >> >> In this example, the top parent is not a partitioned table and we don't need >> to consider partitionwise join for that, so we don't need to apply that to >> the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2). > > FWIW, the test is not sufficient. In the above example, a simple > select * from lpt inner join test where lpt.c1 = test.c1 would not use > partition-wise join technique. Why not to avoid build_childrel_tlist() > in that case as well? I might misunderstand your words, but in the above example the patch doesn't apply build_childrel_tlist to lpt_p1 and lpt_p2. The reason for that is because we can avoid adjusting the tlists for the corresponding subplans at plan creation time so that whole-row Vars in the tlists are transformed into CREs. I think the overhead of the adjustment is not that big, but not zero, so it would be worth avoiding applying build_childrel_tlist to partitions if the top parent won't participate in a partitionwise-join at all. > Worst we can change the criteria to use > partition-wise join in future e.g. above case would use partition-wise > join by 1. merging lpt_p1 into corresponding partition of lpt so that > ss is partitioned and 2. repartitioning test or joining test with each > partition of lpt separately. In those cases the changes you are doing > here need to be reverted and put somewhere else. There's already a > patch to reverse the order of join and grouping out there. How would > this work with that. Interesting, but that seems like a matter of PG12+. > In general I think set_append_rel_size() or build_simple_rel() is not > the place where we should decide whether the given relation will > participate in a PWJ. Also we should not selectively add a > ConvertRowtypeExpr on top of a whole-row Var of some a child relation. > We should do it for all the child rels or shouldn't do it at all. One thing I thought was to apply build_childrel_tlist for all the child rels whether its top parent is a partitioned table or not, but as I mentioned above, that would incur needless overhead for adjusting the tlists for the child rels that don't involve in a partitionwise join such as lpt_p1 and lpt_p2, which I think is not good. > An > in-between state will produce a hell lot of confusion for any further > optimization. Whenever we change the code around partition-wise > operations in future, we will have to check whether or not a given > child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I > have mentioned earlier, I am also not comfortable with the targetlist > of child relations being type inconsistent with that of the parent, > which is a fundamental rule in inheritance. Worst keep them > inconsistent during path creation and make them consistent at the time > of creating plans. A child's whole-row Var has datatype of the child > where as that of parent has datatype of parent. I don't see any critical issue here. Could you elaborate a bit more on that point? > A ConvertRowtypeExpr > is used to fix this inconsistency. That's why I chose to use > pull_var_clause() as a place to fix the problem and not fix > ConvertRowtypeExpr in targetlist itself. I think the biggest issue in the original patch you proposed is that we spend extra cycles where partitioning is not involved, which is the biggest reason why I think the original patch isn't the right way to go. > I am also not comfortable in just avoiding ConvertRowtypeExprs in > targetlist and leave them as is in the conditions and ECs etc. This > means searching a ConvertRowtypeExpr e.g. for creating pathkeys in > targetlist will fail at the time of path creation but will succeed at > the time of plan creation. > > This is turning more invasive that my approach of fixing it through PVC. Sorry, I don't understand this. Could you show me places where the problem happens? >>> + /* No work if the child plan is an Append or MergeAppend */ >>> + if (IsA(subplan, Append) || IsA(subplan, MergeAppend)) >>> + return; >>> >>> Why? Probably it's related to the answer to the first question, But I >>> don't see the connection. Given that partition-wise join will be >>> applicable level by level, we need to recurse in >>> adjust_subplan_tlist(). >> >> >> I don't think so; I think if the child plan is an Append or MergeAppend, the >> tlist for each subplan of the Append or MergeAppend would have already been >> adjusted while create_plan_recurse before we are called here. > > Ok. Thanks for the clarification. I think we should add a comment. Will do. >>> + /* The child plan should be able to do projection */ >>> + Assert(is_projection_capable_plan(subplan)); >>> + >>> Again why? A MergeAppend's subplan could be a Sort plan, which will >>> not be projection capable. >> >> >> IIUC, since we are called here right after create_plan_recurse, the child >> plan would be a scan or join unless it's neither Append nor MergeAppend. So >> it should be projection-capable. Maybe I'm missing something, though. > > That's not true. add_paths_to_append_rel() adds sort paths on scan if > necessary and creates merge append path. Really? I think create_merge_append_path called from generate_mergeappend_paths called from add_paths_to_append_rel uses a dummy sort path just to compute the cost, but I don't think create_merge_append_path (or generate_mergeappend_paths or add_paths_to_append_rel) insert a sort path to a scan (or join) path. Thanks for the comments! Best regards, Etsuro Fujita
pgsql-hackers by date: