Re: Custom Plan node - Mailing list pgsql-hackers
From | Kohei KaiGai |
---|---|
Subject | Re: Custom Plan node |
Date | |
Msg-id | CADyhKSW2FjuYqAQUK4rn_zvLg13FGEb0WJsMG5VdTxJ2rV3fyg@mail.gmail.com Whole thread Raw |
In response to | Re: Custom Plan node (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Custom Plan node
|
List | pgsql-hackers |
2013/9/13 Robert Haas <robertmhaas@gmail.com>: > On Tue, Sep 10, 2013 at 11:45 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: >>> Fair enough, I think. So the action item for KaiGai is to think of >>> how the planner integration might work. >>> >> Do you think the idea I mentioned at the upthread is worth to investigate >> for more detailed consideration? Or, does it seems to you short-sighted >> thinking to fit this infrastructure with planner? >> >> It categorizes plan node into three: join, scan and other stuff. >> Cost based estimation is almost applied on join and scan, so abstracted >> scan and join may make sense to inform core planner what does this >> custom plan node try to do. >> On the other hand, other stuff, like Agg, is a stuff that must be added >> according to the provided query, even if its cost estimation was not small, >> to perform as the provided query described. >> So, I thought important one is integration of join and scan, but manipulation >> of plan tree for other stuff is sufficient. >> >> How about your opinion? > > Well, I don't know that I'm smart enough to predict every sort of > thing that someone might want to do here, unfortunately. This is a > difficult area: there are many possible things someone might want to > do, and as Tom points out, there's a lot of special handling of > particular node types that can make things difficult. And I can't > claim to be an expert in this area. > Sorry for my late response. I've tried to investigate the planner code to find out the way to integrate this custom api, and it is still in progress. One special handling I found was that create_join_plan() adjust root->curOuterRels prior to recursion of inner tree if NestLoop. Probably, we need some flags to control these special handling in the core. It is a hard job to list up all the stuff, so it seems to me we need to check-up them during code construction... > That having been said, I think the idea of a CustomScan node is > probably worth investigating. I don't know if that would work out > well or poorly, but I think it would be worth some experimentation. > Perhaps you could have a hook that gets called for each baserel, and > can decide whether or not it wishes to inject any additional paths; > and then a CustomScan node that could be used to introduce such paths. > I've been thinking that we really ought to have the ability to > optimize CTID range queries, like SELECT * FROM foo WHERE ctid > 'some > constant'. We have a Tid Scan node, but it only handles equalities, > not inequalities. I suppose this functionality should really be in > core, but since it's not it might make an interesting test for the > infrastructure you're proposing. You may be able to think of > something else you like better; it's just a thought. > This above framework was exactly what I considered. Probably, we have to put a hook on functions invoked by set_base_rel_pathlist() to add another possible way to scan the provided baserel, then set_cheapest() will choose the most reasonable one. The attached patch, it's just a works-in-progress, shows which hook I try to put around the code. Please grep it with "add_custom_scan_paths". Regarding to the "guest module" of this framework, another idea that I have is, built-in query cache module that returns previous scan result being cached if table contents was not updated from the previous run. Probably, it makes sense in case when most of rows are filtered out in this scan. Anyway, I'd like to consider something useful to demonstrate this API. > I am a little less sanguine about the chances of a CustomJoin node > working out well. I agree that we need something to handle join > pushdown, but it seems to me that might be done by providing a Foreign > Scan path into the joinrel rather than by adding a concept of foreign > joins per se. > Indeed, if we have a hook on add_paths_to_joinrel(), it also makes sense for foreign tables; probably, planner will choose foreign-path instead of existing join node including foreign-scans. > There are other possible new join types, like the Range > Join that Jeff Davis has mentioned in the past, which might be > interesting. But overall, I can't see us adding very many more join > types, so I'm not totally sure how much extensibility would help us > here. We've added a few scan types over the years (index only scans > in 9.2, and bitmap index scans in 8.1, I think) but all of our > existing join types go back to time immemorial. > It seems to me a significant point. Even if the custom node being added by extension instead of joins looks like a scan for core-PostgreSQL, extension will be able to run its custom join equivalent to join. I think, the above built-in query cache idea make sense to demonstrate this pseudo join node; that will be able to hold a materialized result being already joined. At least, it seems to me sufficient for my target; table join accelerated with GPU device. > And I think that lumping everything else together under "not a scan or > join" has the least promise of all. Is replacing Append really the > same as replacing Sort? I think we'll need to think harder here about > what we're trying to accomplish and how to get there. > As long as extension modifies PlannedStmt on the planner_hook, I don't think it is not difficult so much, as I demonstrate on the previous patch. Unlike scan or join, existing code is not designed to compare multiple possible paths, so it seems to me a feature to adjust a plan-tree already construct is sufficient for most usage because extension can decide which one can offer more cheap path than built-in ones. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
pgsql-hackers by date: