Home > mailing lists

Re: Parallel Seq Scan - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Parallel Seq Scan
Date	February 10, 2015 14:23:09
Msg-id	CA+Tgmoadfbe3ca4dDOpzpvX59MgGZr0+6OddKTwxhQXXUJB3aw@mail.gmail.com Whole thread Raw
In response to	Re: Parallel Seq Scan (Andres Freund <andres@2ndquadrant.com>)
Responses	Re: Parallel Seq Scan
List	pgsql-hackers

Tree view

On Tue, Feb 10, 2015 at 9:08 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> If you make the chunks small enough, and then coordate only the chunk
> distribution, not really.

True, but why do you want to do that in the executor instead of in the heapam?

>> For this case, what I would imagine is that there is one parallel heap
>> scan, and each PartialSeqScan attaches to it.  The executor says "give
>> me a tuple" and heapam.c provides one.  Details like the chunk size
>> are managed down inside heapam.c, and the executor does not know about
>> them.  It just knows that it can establish a parallel scan and then
>> pull tuples from it.
>
> I think that's a horrible approach that'll end up with far more
> entangled pieces than what you're trying to avoid. Unless the tuple flow
> is organized to only happen in the necessary cases the performance will
> be horrible.

I can't understand this at all.  A parallel heap scan, as I've coded
it up, involves no tuple flow at all.  All that's happening at the
heapam.c layer is that we're coordinating which blocks to scan.  Not
to be disrespectful, but have you actually looked at the patch?

> And good chunk sizes et al depend on higher layers,
> selectivity estimates and such. And that's planner/executor work, not
> the physical layer (which heapam.c pretty much is).

If it's true that a good chunk size depends on the higher layers, then
that would be a good argument for doing this differently, or at least
exposing an API for the higher layers to tell heapam.c what chunk size
they want.  I hadn't considered that possibility - can you elaborate
on why you think we might want to vary the chunk size?

> A individual heap scan's state lives in process private memory. And if
> the results inside the separate workers should directly be used in the
> these workers without shipping over the network it'd be horrible to have
> the logic in the heapscan. How would you otherwise model an executor
> tree that does the seqscan and aggregation combined in multiple
> processes at the same time?

Again, the heap scan is not shipping anything anywhere ever in any
design of any patch proposed or written.  The results *are* directly
used inside each individual worker.

>> I think we're in violent agreement here, except for some
>> terminological confusion.  Are there N PartialSeqScan nodes, one
>> running in each node, or is there one ParallelSeqScan node, which is
>> copied and run jointly across N nodes?  You can talk about either way
>> and have it make sense, but we haven't had enough conversations about
>> this on this list to have settled on a consistent set of vocabulary
>> yet.
>
> I pretty strongly believe that it has to be independent scan nodes. Both
> from a implementation and a conversational POV. They might have some
> very light cooperation between them (e.g. coordinating block ranges or
> such), but everything else should be separate. From an implementation
> POV it seems pretty awful to have executor node that's accessed by
> multiple separate backends - that'd mean it have to be concurrency safe,
> have state in shared memory and everything.

I don't agree with that, but again I think it's a terminological
dispute.  I think what will happen is that you will have a single node
that gets copied into multiple backends, and in some cases a small
portion of its state will live in shared memory.  That's more or less
what you're thinking of too, I think.

But what I don't want is - if we've got a parallel scan-and-aggregate
happening in N nodes, EXPLAIN shows N copies of all of that - not only
because it's display clutter, but also because a plan to do that thing
with 3 workers is fundamentally the same as a plan to do it with 30
workers.  Those plans shouldn't look different, except perhaps for a
line some place that says "Number of Workers: N".

> Now, there'll be a node that needs to do some parallel magic - but in
> the above example that should be the AggCombinerNode, which would not
> only ask for tuples from one of the children at a time, but ask multiple
> ones in parallel. But even then it doesn't have to deal with concurrency
> around it's own state.

Sure, we clearly want to minimize the amount of coordination between nodes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Andres Freund
Date: 10 February 2015, 14:08:46
Subject: Re: Parallel Seq Scan

From: Tom Lane
Date: 10 February 2015, 15:08:02
Subject: Re: Corner case for add_path_precheck

Re: Parallel Seq Scan - Mailing list pgsql-hackers

Previous

Next