Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | Re: Foreign join pushdown vs EvalPlanQual |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F801111349@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
In response to | Re: Foreign join pushdown vs EvalPlanQual (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>) |
Responses |
Re: Foreign join pushdown vs EvalPlanQual
|
List | pgsql-hackers |
> On 2015/07/02 23:13, Kouhei Kaigai wrote: > >> To be honest, ISTM that it's difficult to do that simply and efficiently > >> for the foreign/custom-join-pushdown API that we have for 9.5. It's a > >> little late, but what I started thinking is to redesign that API so that > >> that API is called at standard_join_search, as discussed in [2]; (1) to > >> place that API call *after* the set_cheapest call and (2) to place > >> another set_cheapest call after that API call for each joinrel. By the > >> first set_cheapest call, I think we could probably save an alternative > >> path that we need in "cheapest_builtin_path". By the second > >> set_cheapest call following that API call, we could consider > >> foreign/custom-join-pushdown paths also. What do you think about this idea? > > > Disadvantage is larger than advantage, sorry. > > The reason why we put foreign/custom-join hook on add_paths_to_joinrel() > > is that the source relations (inner/outer) were not obvious, thus, > > we cannot reproduce which relations are the source of this join. > > So, I had to throw a spoon when I tried this approach before. > > Maybe I'm missing something, but my image about this approach is that if > base relations for a given joinrel are all foreign tables and belong to > the same foreign server, then by calling that API there, we consider the > remote join over all the foreign tables, and that if not, we give up to > consider the remote join. > Your understanding is correct, but missing a point. Once foreign tables to be joined are informed as a bitmap (joinrel->relids), it is not obvious for extensions which relations are joined with INNER JOIN, and which ones are joined with OUTER JOIN. I tried to implement a common utility function under the v9.5 cycle, however, it was suspicious whether we can make a reliable logic. Also, I don't want to stick on the assumption that relations involved in remote join are all managed by same foreign-server no longer. The following two ideas introduce possible enhancement of remote join feature that involved local relations; replicated table or transformed to VALUES() clause. http://www.postgresql.org/message-id/CA+Tgmoai_VUF5h6qVLNLU+FKp0aeBCbnnMT3SCvL-HvOpBR=Xw@mail.gmail.com http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010F20AD@BPXM15GP.gisp.nec.co.jp Once we have to pay attention to the case of local/foreign relations mixed, we have to care about the path of underlying local or foreign relations managed by other foreign server. I think add_paths_to_joinrel() is the best location for foreign-join, not only custom-join. Relocation to standard_join_search() has larger disadvantage than its advantage. > > My idea is that we save the cheapest_total_path of RelOptInfo onto the > > new cheapest_builtin_path just before the GetForeignJoinPaths() hook. > > Why? It should be a built-in join logic, never be a foreign/custom-join, > > because of the hook location; only built-in logic shall be added here. > > My concern about your idea is that since that (a) add_paths_to_joinrel > is called multiple times per joinrel and that (b) repetitive add_path > calls through GetForeignJoinPaths in add_paths_to_joinrel might remove > old paths that are builtin, it's possible to save a path that is not > builtin onto the cheapest_total_path and thus to save that path wrongly > onto the cheapest_builtin_path. There might be a good way to cope with > that, though. > For the concern (a), FDW driver can reference RelOptInfo->fdw_private that shall be initialized to NULL, then FDW driver will set valid data if it preliminary adds something. IIRC, postgres_fdw also skips to add same path multiple times. For the concern (b), yep, we may enhance add_path() to retain built-in path, instead of the add_paths_to_joinrel(). Let's adjust the logic a bit. The add_path() can know whether the given path is usual or exceptional (ForeignPath/CustomPath towards none base relation) one. If path is exceptional, the cheapest_builtin_path shall be retained unconditionally. Elsewhere, the cheapest one replace here, then the cheapest built-in path will survive. Is it still problematic? > > Regarding to the development timeline, I prefer to put something > > workaround not to kick Assert() on ExecScanFetch(). > > We may add a warning in the documentation not to replace built-in > > join if either/both of sub-trees are target of UPDATE/DELETE or > > FOR SHARE/UPDATE. > > I'm not sure that that is a good idea, but anyway, I think we need to > hurry fixing this issue. > My approach is not fix, but avoid. :-) It may be an idea to implement the above fixup even though it may be too large/late to apply v9.5 features, but we can understand how many changes are needed to fixup this problem. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
pgsql-hackers by date: