Home > mailing lists

Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

From	Amit Langote
Subject	Re: [DESIGN] ParallelAppend
Date	July 29, 2015 01:46:22
Msg-id	55B8301B.80407@lab.ntt.co.jp Whole thread Raw
In response to	Re: [DESIGN] ParallelAppend (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Responses	Re: [DESIGN] ParallelAppend
List	pgsql-hackers

Tree view

KaiGai-san,

On 2015-07-28 PM 09:58, Kouhei Kaigai wrote:
>>
>> From my understanding of parallel seqscan patch, each worker's
>> PartialSeqScan asks for a block to scan using a shared parallel heap scan
>> descriptor that effectively keeps track of division of work among
>> PartialSeqScans in terms of blocks. What if we invent a PartialAppend
>> which each worker would run in case of a parallelized Append. It would use
>> some kind of shared descriptor to pick a relation (Append member) to scan.
>> The shared structure could be the list of subplans including the mutex for
>> concurrency. It doesn't sound as effective as proposed
>> ParallelHeapScanDescData does for PartialSeqScan but any more granular
>> might be complicated. For example, consider (current_relation,
>> current_block) pair. If there are more workers than subplans/partitions,
>> then multiple workers might start working on the same relation after a
>> round-robin assignment of relations (but of course, a later worker would
>> start scanning from a later block in the same relation). I imagine that
>> might help with parallelism across volumes if that's the case.
>>
> I initially thought ParallelAppend kicks fixed number of background workers
> towards sub-plans, according to the estimated cost on the planning stage.
> However, I'm now inclined that background worker picks up an uncompleted
> PlannedStmt first. (For more details, please see the reply to Amit Kapila)
> It looks like less less-grained worker's job distribution.
> Once number of workers gets larger than number of volumes / partitions,
> it means more than two workers begin to assign same PartialSeqScan, thus
> it takes fine-grained job distribution using shared parallel heap scan.
> 

I like your idea of using round-robin assignment of partial/non-partial
sub-plans to workers. Do you think there are two considerations of cost
here: sub-plans themselves could have parallel paths to consider and (I
think) your proposal introduces a new consideration - a plain old
synchronous Append path vs. parallel asynchronous Append with Funnel
(below/above?) it. I guess the asynchronous version would always be
cheaper. So, even if we end up with non-parallel sub-plans do we still add
a Funnel to make Append asynchronous? Am I missing something?

>> MergeAppend
>> parallelization might involve a bit more complication but may be feasible
>> with a PartialMergeAppend with slightly different kind of coordination
>> among workers. What do you think of such an approach?
>>
> Do we need to have something special in ParallelMergeAppend?
> If individual child nodes are designed to return sorted results,
> what we have to do seems to me same.
> 

Sorry, I was wrongly worried because I did not really know that
MergeAppend uses a binaryheap to store tuples before returning.

Thanks,
Amit

pgsql-hackers by date:

From: Andreas Karlsson
Date: 29 July 2015, 01:45:21
Subject: Re: [PATCH] Reload SSL certificates on SIGHUP

From: Michael Paquier
Date: 29 July 2015, 02:02:01
Subject: Re: [PATCH] Reload SSL certificates on SIGHUP

Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

Previous

Next