Re: Push down more full joins in postgres_fdw - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: Push down more full joins in postgres_fdw |
Date | |
Msg-id | 5aa9ed42-a00f-be85-1c8d-4b10a0b12c7e@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Push down more full joins in postgres_fdw (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: Push down more full joins in postgres_fdw
|
List | pgsql-hackers |
On 2016/10/25 18:58, Ashutosh Bapat wrote: You wrote: >>> 13. The comment below is missing the main purpose i.e. creating a a unique >>> alias, in case the relation gets converted into a subquery. Lowest or >>> highest >>> relid will create a unique alias at given level of join and that would be >>> more >>> future proof. If we decide to consider paths for every join order, >>> following >>> comment will no more be true. >>> + /* >>> + * Set the relation index. This is defined as the position of this >>> + * joinrel in the join_rel_list list plus the length of the rtable >>> list. >>> + * Note that since this joinrel is at the end of the list when we are >>> + * called, we can get the position by list_length. >>> + */ >>> + fpinfo->relation_index = >>> + list_length(root->parse->rtable) + >>> list_length(root->join_rel_list); I wrote: >> I don't agree on that point. As I said before, the approach using the >> lowest/highest relid would make a remote query having nested subqueries >> difficult to read. And even if we decided to consider paths for every join >> order, we could define the relation_index safely, by modifying >> postgresGetForeignJoinPaths so that it (1) sets the relation_index the >> proposed way at the first time it is called and (2) avoids setting the >> relation_index after the first call, by checking to see >> fpinfo->relation_index > 0. > OK. Although, the alias which contains a number, which apparently > doesn't show any relation with the relation (no pun intended :)) being > deparsed, won't make much sense in the deparsed query. But I am fine > leaving this to the committer's judgement. Fine with me. > May be you want to add an > assert here making sure that relation_index is set only once. > Something like Assert(fpinfo->relation_index == 0) just before setting > it. Will do. >>> 15. While deparsing every Var, we are descending the RelOptInfo hierarchy. >> Right. I think that not only avoids creating redundant data to find the >> alias of a given Var node but also would be able to find it in optimal cost. >>> Instead of that, we should probably gather all the alias information once >>> and >>> store it in context as an array indexed by relid. While deparsing a Var we >>> can >>> get the targetlist and alias for a given relation by using var->varno as >>> index. >>> And then search for given Var node in the targetlist to deparse the column >>> name >>> by its position in the targetlist. For the relations that are not >>> converted >>> into subqueries, this array will not hold any information and the Var will >>> be >>> deparsed as it's being done today. >> Sorry, I don't understand this part. How does that work when creating a >> remote query having nested subqueries? Is that able to search for a given >> Var node efficiently? > For every relation that is deparsed as a subquery, we will create a > separate context. Each such context will have an array described > above. This array will contain the targetlist and aliases for all the > relations, covered by that relation, that are required to be deparsed > as subqueries. These arrays bubble up to topmost relation that is not > required to be deparsed as subquery. When a relation is required to be > deparsed as a subquery, its immediate upper relation sets the alias > and targetlist chosen for that relation at all the indexes > corresponding to all the base relations covered by that relation. Thanks for the explanation! > For > example, let's assume that a relation (1, 2, 3) is required to be > deparsed as subquery for an immediate upper relation (1, 2, 3, 4, 5) > (thus the other joining relation being (4,5)). While deparsing for > relation (1,2,3,4,5), the context will contain a 5 element array, with > positions 1, 2, 3 filled by same targetlist and alias whereas > positions 4 and 5 will not be filled as those relations are not > deparsed as subqueries. Sorry, I don't understand this. In that case, the immediate upper relation (1, 2, 3, 4, 5) would need to fill the targetlist and aliases for *the join relation (1, 2, 3) somewhere*, not the targetlist and aliases for each of the component relations 1, 2, and 3, because the join relation is deparsed as a subquery. Maybe I'm missing something, though. > Let's assume in relation (1, 2, 3), (1, 3) in > turn requires subquery but (2) does not. Thus the context created > while deparsing (1, 2, 3) will have a 3 element array with positions 1 > and 3 containing the same targetlist and alias, where as position 2 > will be empty. > When deparsing a Var node with varno = N and varattno = > m, if the nth position in the array in the context is empty, that Var > node will be deparsed as rN.<column name>. What happens when deparsing eg, a Var with varno = 2 at the topmost relation (1, 2, 3, 4, 5)? The second position of the array is empty, but the join relation (1, 2, 3) is deparsed as a subquery, so the Var should be deparsed as an alias of an output column of the subquery at the topmost relation, I think. > But if that position is has > alias sZ, then we search for that Var node in the targetlist and if > it's found at kth position in the targetlist, we will deparse it as > sZ.ck. The search in the targetlist can be performed using > tlist_member, and then fetching the position by TargetEntry::resno. > This does not require any recursion and thus saves stack space and > some CPU cycles required for recursion. Is that true? > some CPU cycles required for recursion. I guess, the arrays need to be > computed only once for any relation when the query for that relation > is deparsed the first time. Does this algorithm extend to the case where we consider paths for every join order? > Thanks for considering other suggestions. Your suggestions are really helpful to improve the patch. Thanks! Best regards, Etsuro Fujita
pgsql-hackers by date: