Home > mailing lists

Re: parallelizing subplan execution (was: explain and PARAM_EXEC) - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: parallelizing subplan execution (was: explain and PARAM_EXEC)
Date	June 26, 2010 22:01:13
Msg-id	AANLkTinpb6x8Wn6LOMbCXZRSSHAdpT1wiUK2rrzcAlQp@mail.gmail.com Whole thread Raw
In response to	Re: parallelizing subplan execution (was: explain and PARAM_EXEC) (Mark Wong <markwkm@gmail.com>)
Responses	Re: parallelizing subplan execution (was: explain and PARAM_EXEC) Re: parallelizing subplan execution (was: explain and PARAM_EXEC)
List	pgsql-hackers

Tree view

On Fri, Jun 25, 2010 at 10:47 PM, Mark Wong <markwkm@gmail.com> wrote:
> http://pages.cs.wisc.edu/~dewitt/includes/publications.html
>
> Some of these papers aren't the type of parallelism we're talking
> about here, but the ones that I think are relevant talk mostly about
> parallelizing hash based joins.  I think we might be lacking an
> operator or two though in order to do some of these things.

This part (from the first paper linked on that page) is not terribly
encouraging.

"Current database query optimizers do not consider all possible plans
when optimizing a relational query. While cost models for relational
queries running on a single processor are now well-understood
[SELI79], they still depend on cost estimators that are a guess at
best. Some dynamically select from among several plans at run time
depending on, for example, the amount of physical memory actually
available and the cardinalities of the intermediate results [GRAE89].
To date, no query optimizers consider all the parallel algorithms for
each operator and all the query tree organizations. More work is
needed in this area."

The section (from that same paper) on parallelizing hash joins and
merge-join-over-sort is interesting, and I can definitely imagine
those techniques being a win for us.  But I'm not too sure how we'd
know when to apply them - that is, what algorithm would the query
optimizer use?  I'm sure we could come up with something, but I'd get
a warmer, fuzzier feeling if we could implement the fruits of someone
else's research rather than rolling our own.

>> I'm also ignoring the difficulties of getting hold of a second backend
>> in the right state - same database, same snapshot, etc.  It seems to
>> me unlikely that there are a substantial number of real-world
>> applications for which this will not work very well if we have to
>> actually start a new backend every time we want to parallelize a
>> query.  IOW, we're going to need, well, a connection pool in core.
>> *ducks, runs for cover*
>
> Do we think it's worth proofing that we can execute a plan in
> parallel?  Something simple, if not the best case, say a nested loop
> join between two tables?  Just as a starting point before worrying too
> much about what is the best thing to parallelize, or how the degree of
> parallelism will be controller?

Well, we can certainly DO it, I guess.  It's just a question of
whether we can make it fairly automatic and capable of delivering good
results in the real world.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

pgsql-hackers by date:

From: Robert Haas
Date: 26 June 2010, 14:20:07
Subject: Re: Admission Control

From: Simon Riggs
Date: 27 June 2010, 08:43:05
Subject: Re: EOL is when?

Re: parallelizing subplan execution (was: explain and PARAM_EXEC) - Mailing list pgsql-hackers

Previous

Next