Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables |
Date | |
Msg-id | CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku-ijyNR=PbXj1e=9a=qMoQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
|
List | pgsql-hackers |
On Mon, Mar 20, 2017 at 8:21 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Mar 17, 2017 at 8:10 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> While I was studying what you did with reparameterize_path_by_child(), >> I started to wonder whether reparameterize_path() doesn't need to >> start handling join paths. I think it only handles scan paths right >> now because that's the only thing that can appear under an appendrel >> created by inheritance expansion, but you're changing that. Maybe >> it's not critical -- I think the worst consequences of missing some >> handling there is that we won't consider a parameterized path in some >> case where it would be advantageous to do so. Still, you might want >> to investigate a bit. > > I spent a fair amount of time this weekend musing over > reparameterize_path_by_child(). I think a key question for this patch > - as you already pointed out - is whether we're happy with that > approach. When we discover that we want to perform a partitionwise > parameterized nestloop, and therefore that we need the paths for each > inner appendrel to get their input values from the corresponding outer > appendrel members rather than from the outer parent, we've got two > choices. The first is to do what the patch actually does, which is to > build a new path tree for the nestloop inner path parameterized by the > appropriate childrel. The second is to use the existing paths, which > are parameterized by the parent rel, and then somehow allow make that > work. For example, you can imagine that create_plan_recurse() could > pass down a list of parameterized nestloops above the current point in > the path tree, and a parent-child mapping for each, and then we could > try to substitute everything while actually generating the plan > instead of creating paths sooner. Which is better? > > It would be nice to hear opinions from anyone else who cares, but > after some thought I think the approach you've picked is probably > better, because it's more like what we do already. We have existing > precedent for reparameterizing a path, but none for allowing a Var for > one relation (the parent) to in effect refer to another relation (the > child). Right. If we could use parent Vars to indicate parent Var or child Var depending upon the context, a lot of memory issues would be solved; we wouldn't need to translate a single expression. But I think that's not straight forward. I have been thinking about some kind of polymorphic Var node, but it seems a lot more invasive change. Although, if we could get something like that, we would save a huge memory. :) > > That having been said, having try_nestloop_path() perform the > reparameterization at the very top of the function seems quite > undesirable. You're creating a new path there before you know whether > it's going to be rejected by the invalid-parameterization test and > also before you know whether initial_cost_nestloop is going to reject > it. It would be much better if you could find a way to postpone the > reparameterization until after those steps, and only do it if you're > going to try add_path(). Hmm. I think we can do that by refactoring calc_nestloop_required_outer(), allow_star_schema_join() and have_dangerous_phv() to use relids instead of paths. If the checks pass for a join between parents, those should pass for joins between children. Done in the attached set of patches. try_nestloop_path has few new variables. Among those innerrelids and outerrelids indicate the relids to be used by the parameterization checks (see patch for details). They are not relids of inner and outer relations resp. but kind of effective relids to be used. But I couldn't come up with better names which convey proper meaning and still are short enough. effective_innerrelids is mouthful. I am wondering whether we need to change calc_non_nestloop_required_outer() similar to calc_nestloop_required_outer() just to keep their signatures in sync. Should I work on completing reparamterized_path_by_child() to support all kinds of paths? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: