Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: [DESIGN] ParallelAppend |
Date | |
Msg-id | 55B8301B.80407@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [DESIGN] ParallelAppend (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
Responses |
Re: [DESIGN] ParallelAppend
|
List | pgsql-hackers |
KaiGai-san, On 2015-07-28 PM 09:58, Kouhei Kaigai wrote: >> >> From my understanding of parallel seqscan patch, each worker's >> PartialSeqScan asks for a block to scan using a shared parallel heap scan >> descriptor that effectively keeps track of division of work among >> PartialSeqScans in terms of blocks. What if we invent a PartialAppend >> which each worker would run in case of a parallelized Append. It would use >> some kind of shared descriptor to pick a relation (Append member) to scan. >> The shared structure could be the list of subplans including the mutex for >> concurrency. It doesn't sound as effective as proposed >> ParallelHeapScanDescData does for PartialSeqScan but any more granular >> might be complicated. For example, consider (current_relation, >> current_block) pair. If there are more workers than subplans/partitions, >> then multiple workers might start working on the same relation after a >> round-robin assignment of relations (but of course, a later worker would >> start scanning from a later block in the same relation). I imagine that >> might help with parallelism across volumes if that's the case. >> > I initially thought ParallelAppend kicks fixed number of background workers > towards sub-plans, according to the estimated cost on the planning stage. > However, I'm now inclined that background worker picks up an uncompleted > PlannedStmt first. (For more details, please see the reply to Amit Kapila) > It looks like less less-grained worker's job distribution. > Once number of workers gets larger than number of volumes / partitions, > it means more than two workers begin to assign same PartialSeqScan, thus > it takes fine-grained job distribution using shared parallel heap scan. > I like your idea of using round-robin assignment of partial/non-partial sub-plans to workers. Do you think there are two considerations of cost here: sub-plans themselves could have parallel paths to consider and (I think) your proposal introduces a new consideration - a plain old synchronous Append path vs. parallel asynchronous Append with Funnel (below/above?) it. I guess the asynchronous version would always be cheaper. So, even if we end up with non-parallel sub-plans do we still add a Funnel to make Append asynchronous? Am I missing something? >> MergeAppend >> parallelization might involve a bit more complication but may be feasible >> with a PartialMergeAppend with slightly different kind of coordination >> among workers. What do you think of such an approach? >> > Do we need to have something special in ParallelMergeAppend? > If individual child nodes are designed to return sorted results, > what we have to do seems to me same. > Sorry, I was wrongly worried because I did not really know that MergeAppend uses a binaryheap to store tuples before returning. Thanks, Amit
pgsql-hackers by date: