Re: Parallel Seq Scan - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Parallel Seq Scan |
Date | |
Msg-id | CA+Tgmoadfbe3ca4dDOpzpvX59MgGZr0+6OddKTwxhQXXUJB3aw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Seq Scan (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: Parallel Seq Scan
|
List | pgsql-hackers |
On Tue, Feb 10, 2015 at 9:08 AM, Andres Freund <andres@2ndquadrant.com> wrote: > If you make the chunks small enough, and then coordate only the chunk > distribution, not really. True, but why do you want to do that in the executor instead of in the heapam? >> For this case, what I would imagine is that there is one parallel heap >> scan, and each PartialSeqScan attaches to it. The executor says "give >> me a tuple" and heapam.c provides one. Details like the chunk size >> are managed down inside heapam.c, and the executor does not know about >> them. It just knows that it can establish a parallel scan and then >> pull tuples from it. > > I think that's a horrible approach that'll end up with far more > entangled pieces than what you're trying to avoid. Unless the tuple flow > is organized to only happen in the necessary cases the performance will > be horrible. I can't understand this at all. A parallel heap scan, as I've coded it up, involves no tuple flow at all. All that's happening at the heapam.c layer is that we're coordinating which blocks to scan. Not to be disrespectful, but have you actually looked at the patch? > And good chunk sizes et al depend on higher layers, > selectivity estimates and such. And that's planner/executor work, not > the physical layer (which heapam.c pretty much is). If it's true that a good chunk size depends on the higher layers, then that would be a good argument for doing this differently, or at least exposing an API for the higher layers to tell heapam.c what chunk size they want. I hadn't considered that possibility - can you elaborate on why you think we might want to vary the chunk size? > A individual heap scan's state lives in process private memory. And if > the results inside the separate workers should directly be used in the > these workers without shipping over the network it'd be horrible to have > the logic in the heapscan. How would you otherwise model an executor > tree that does the seqscan and aggregation combined in multiple > processes at the same time? Again, the heap scan is not shipping anything anywhere ever in any design of any patch proposed or written. The results *are* directly used inside each individual worker. >> I think we're in violent agreement here, except for some >> terminological confusion. Are there N PartialSeqScan nodes, one >> running in each node, or is there one ParallelSeqScan node, which is >> copied and run jointly across N nodes? You can talk about either way >> and have it make sense, but we haven't had enough conversations about >> this on this list to have settled on a consistent set of vocabulary >> yet. > > I pretty strongly believe that it has to be independent scan nodes. Both > from a implementation and a conversational POV. They might have some > very light cooperation between them (e.g. coordinating block ranges or > such), but everything else should be separate. From an implementation > POV it seems pretty awful to have executor node that's accessed by > multiple separate backends - that'd mean it have to be concurrency safe, > have state in shared memory and everything. I don't agree with that, but again I think it's a terminological dispute. I think what will happen is that you will have a single node that gets copied into multiple backends, and in some cases a small portion of its state will live in shared memory. That's more or less what you're thinking of too, I think. But what I don't want is - if we've got a parallel scan-and-aggregate happening in N nodes, EXPLAIN shows N copies of all of that - not only because it's display clutter, but also because a plan to do that thing with 3 workers is fundamentally the same as a plan to do it with 30 workers. Those plans shouldn't look different, except perhaps for a line some place that says "Number of Workers: N". > Now, there'll be a node that needs to do some parallel magic - but in > the above example that should be the AggCombinerNode, which would not > only ask for tuples from one of the children at a time, but ask multiple > ones in parallel. But even then it doesn't have to deal with concurrency > around it's own state. Sure, we clearly want to minimize the amount of coordination between nodes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: