Re: Parallel Seq Scan - Mailing list pgsql-hackers
From | Thom Brown |
---|---|
Subject | Re: Parallel Seq Scan |
Date | |
Msg-id | CAA-aLv7KVJ=5zNEu+zX-mDqFvWf-=XARKyDW6gDu_A1-UwsiEA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Seq Scan (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Parallel Seq Scan
|
List | pgsql-hackers |
On 25 March 2015 at 15:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
parallel-mode-v9.patch
assess-parallel-safety-v4.patch
parallel-heap-scan.patch
parallel_seqscan_v12.patch
release_lock_dsm_v1.patch
On Wed, Mar 25, 2015 at 5:16 PM, Thom Brown <thom@linux.com> wrote:
>
> On 25 March 2015 at 10:27, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> Fixed the reported issue on assess-parallel-safety thread and another
>> bug caught while testing joins and integrated with latest version of
>> parallel-mode patch (parallel-mode-v9 patch).
>>
>> Apart from that I have moved the Initialization of dsm segement from
>> InitNode phase to ExecFunnel() (on first execution) as per suggestion
>> from Robert. The main idea is that as it creates large shared memory
>> segment, so do the work when it is really required.
>>
>>
>> HEAD Commit-Id: 11226e38
>> parallel-mode-v9.patch [2]
>> assess-parallel-safety-v4.patch [1]
>> parallel-heap-scan.patch [3]
>> parallel_seqscan_v12.patch (Attached with this mail)
>>
>> [1] - http://www.postgresql.org/message-id/CA+TgmobJSuefiPOk6+i9WERUgeAB3ggJv7JxLX+r6S5SYydBRQ@mail.gmail.com
>> [2] - http://www.postgresql.org/message-id/CA+TgmoZfSXZhS6qy4Z0786D7iU_AbhBVPQFwLthpSvGieczqHg@mail.gmail.com
>> [3] - http://www.postgresql.org/message-id/CA+TgmoYJETgeAXUsZROnA7BdtWzPtqExPJNTV1GKcaVMgSdhug@mail.gmail.com
>
>
> Okay, with my pgbench_accounts partitioned into 300, I ran:
>
> SELECT DISTINCT bid FROM pgbench_accounts;
>
> The query never returns,You seem to be hitting the issue I have pointed in near-by thread [1]and I have mentioned the same while replying on assess-parallel-safetythread. Can you check after applying the patch in mail [1]
Ah, okay, here's the patches I've now applied:
parallel-mode-v9.patch
assess-parallel-safety-v4.patch
parallel-heap-scan.patch
parallel_seqscan_v12.patch
release_lock_dsm_v1.patch
(with perl patch for pg_proc.h)
The query now returns successfully.
> and I also get this:
>
> grep -r 'starting background worker process "parallel worker for PID 12165"' postgresql-2015-03-25_112522.log | wc -l
> 2496
>
> 2,496 workers? This is with parallel_seqscan_degree set to 8. If I set it to 2, this number goes down to 626, and with 16, goes up to 4320.
>..>
> Still not sure why 8 workers are needed for each partial scan. I would expect 8 workers to be used for 8 separate scans. Perhaps this is just my misunderstanding of how this feature works.
>The reason is that for each table scan, it tries to use workersequal to parallel_seqscan_degree if they are available and in thiscase as the scan for inheritance hierarchy (tables in hierarchy) happensone after another, it uses 8 workers for each scan. I think as of nowthe strategy to decide number of workers to be used in scan is keptsimple and in future we can try to come with some better mechanismto decide number of workers.
Yes, I was expecting the parallel aspect to apply across partitions (a worker per partition up to parallel_seqscan_degree and reallocate to another scan once finished with current job), not individual ones, so for the workers to be above the funnel, not below it. So this is parallelising, just not in a way that will be a win in this case. :( For the query I posted (SELECT DISTINCT bid FROM pgbench_partitions), the parallelised version takes 8 times longer to complete. However, I'm perhaps premature in what I expect from the feature at this stage.
--
Thom
pgsql-hackers by date: