Home > mailing lists

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled. - Mailing list pgsql-hackers

From	Etsuro Fujita
Subject	Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.
Date	July 4, 2018 15:06:11
Msg-id	5B3CB833.4040005@lab.ntt.co.jp Whole thread Raw
In response to	Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled. (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses	Re: Expression errors with "FOR UPDATE" and postgres_fdw withpartition wise join enabled.
List	pgsql-hackers

Tree view

(2018/07/04 19:04), Ashutosh Bapat wrote:
> On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/06/22 22:54), Ashutosh Bapat wrote:
>>> +       if (enable_partitionwise_join&&   rel->top_parent_is_partitioned)
>>> +       {
>>> +           build_childrel_tlist(root, rel, childrel, 1,&appinfo);
>>> +       }
>>>
>>> Why do we need rel->top_parent_is_partitioned? If a relation is
>>> partitioned (if (rel->part_scheme), it's either the top parent or is
>>> partition of some other partitioned table. In either case this
>>> condition will be true.
>>
>>
>> This would be needed to avoid unnecessarily applying build_childrel_tlist to
>> child rels of a partitioned table for which we don't consider partitionwise
>> join.  Consider:
>>
>> postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
>> CREATE TABLE
>> postgres=# create table lpt_p1 partition of lpt for values in (1);
>> CREATE TABLE
>> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
>> CREATE TABLE
>> postgres=# create table test (c1 int, c2 text);
>> CREATE TABLE
>> postgres=# explain verbose select * from (select * from lpt union all select
>> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);
>
> --- plan clipped
>
>>
>> In this example, the top parent is not a partitioned table and we don't need
>> to consider partitionwise join for that, so we don't need to apply that to
>> the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2).
>
> FWIW, the test is not sufficient. In the above example, a simple
> select * from lpt inner join test where lpt.c1 = test.c1 would not use
> partition-wise join technique. Why not to avoid build_childrel_tlist()
> in that case as well?

I might misunderstand your words, but in the above example the patch 
doesn't apply build_childrel_tlist to lpt_p1 and lpt_p2.  The reason for 
that is because we can avoid adjusting the tlists for the corresponding 
subplans at plan creation time so that whole-row Vars in the tlists are 
transformed into CREs.  I think the overhead of the adjustment is not 
that big, but not zero, so it would be worth avoiding applying 
build_childrel_tlist to partitions if the top parent won't participate 
in a partitionwise-join at all.

> Worst we can change the criteria to use
> partition-wise join in future e.g. above case would use partition-wise
> join by 1. merging lpt_p1 into corresponding partition of lpt so that
> ss is partitioned and 2. repartitioning test or joining test with each
> partition of lpt separately. In those cases the changes you are doing
> here need to be reverted and put somewhere else. There's already a
> patch to reverse the order of join and grouping out there. How would
> this work with that.

Interesting, but that seems like a matter of PG12+.

> In general I think set_append_rel_size() or build_simple_rel() is not
> the place where we should decide whether the given relation will
> participate in a PWJ. Also we should not selectively add a
> ConvertRowtypeExpr on top of a whole-row Var of some a child relation.
> We should do it for all the child rels or shouldn't do it at all.

One thing I thought was to apply build_childrel_tlist for all the child 
rels whether its top parent is a partitioned table or not, but as I 
mentioned above, that would incur needless overhead for adjusting the 
tlists for the child rels that don't involve in a partitionwise join 
such as lpt_p1 and lpt_p2, which I think is not good.

> An
> in-between state will produce a hell lot of confusion for any further
> optimization. Whenever we change the code around partition-wise
> operations in future, we will have to check whether or not a given
> child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I
> have mentioned earlier, I am also not comfortable with the targetlist
> of child relations being type inconsistent with that of the parent,
> which is a fundamental rule in inheritance. Worst keep them
> inconsistent during path creation and make them consistent at the time
> of creating plans. A child's whole-row Var has datatype of the child
> where as that of parent has datatype of parent.

I don't see any critical issue here.  Could you elaborate a bit more on 
that point?

> A ConvertRowtypeExpr
> is used to fix this inconsistency. That's why I chose to use
> pull_var_clause() as a place to fix the problem and not fix
> ConvertRowtypeExpr in targetlist itself.

I think the biggest issue in the original patch you proposed is that we 
spend extra cycles where partitioning is not involved, which is the 
biggest reason why I think the original patch isn't the right way to go.

> I am also not comfortable in just avoiding ConvertRowtypeExprs in
> targetlist and leave them as is in the conditions and ECs etc. This
> means searching a ConvertRowtypeExpr e.g. for creating pathkeys in
> targetlist will fail at the time of path creation but will succeed at
> the time of plan creation.
>
> This is turning more invasive that my approach of fixing it through PVC.

Sorry, I don't understand this.  Could you show me places where the 
problem happens?

>>> +   /* No work if the child plan is an Append or MergeAppend */
>>> +   if (IsA(subplan, Append) || IsA(subplan, MergeAppend))
>>> +       return;
>>>
>>> Why? Probably it's related to the answer to the first question, But I
>>> don't see the connection. Given that partition-wise join will be
>>> applicable level by level, we need to recurse in
>>> adjust_subplan_tlist().
>>
>>
>> I don't think so; I think if the child plan is an Append or MergeAppend, the
>> tlist for each subplan of the Append or MergeAppend would have already been
>> adjusted while create_plan_recurse before we are called here.
>
> Ok. Thanks for the clarification. I think we should add a comment.

Will do.

>>> +   /* The child plan should be able to do projection */
>>> +   Assert(is_projection_capable_plan(subplan));
>>> +
>>> Again why? A MergeAppend's subplan could be a Sort plan, which will
>>> not be projection capable.
>>
>>
>> IIUC, since we are called here right after create_plan_recurse, the child
>> plan would be a scan or join unless it's neither Append nor MergeAppend.  So
>> it should be projection-capable.  Maybe I'm missing something, though.
>
> That's not true. add_paths_to_append_rel() adds sort paths on scan if
> necessary and creates merge append path.

Really?  I think create_merge_append_path called from 
generate_mergeappend_paths called from add_paths_to_append_rel uses a 
dummy sort path just to compute the cost, but I don't think 
create_merge_append_path (or generate_mergeappend_paths or 
add_paths_to_append_rel) insert a sort path to a scan (or join) path.

Thanks for the comments!

Best regards,
Etsuro Fujita

pgsql-hackers by date:

From: Andrew Dunstan
Date: 04 July 2018, 14:55:53
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?

From: Pavan Deolasee
Date: 04 July 2018, 15:08:13
Subject: Locking considerations of REINDEX

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled. - Mailing list pgsql-hackers

Previous

Next