Re: [HACKERS] parallelize queries containing initplans - Mailing list pgsql-hackers
From | Kuntal Ghosh |
---|---|
Subject | Re: [HACKERS] parallelize queries containing initplans |
Date | |
Msg-id | CAGz5QC+uHOq78GCika3fbgRyN5zgiDR9Dd1Th5kENF+UpnPomQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] parallelize queries containing initplans (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: parallelize queries containing initplans
|
List | pgsql-hackers |
On Tue, Mar 14, 2017 at 3:20 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > Based on that idea, I have modified the patch such that it will > compute the set of initplans Params that are required below gather > node and store them as bitmap of initplan params at gather node. > During set_plan_references, we can find the intersection of external > parameters that are required at Gather or nodes below it with the > initplans that are passed from same or above query level. Once the set > of initplan params are established, we evaluate those (if they are not > already evaluated) before execution of gather node and then pass the > computed value to each of the workers. To identify whether a > particular param is parallel safe or not, we check if the paramid of > the param exists in initplans at same or above query level. We don't > allow to generate gather path if there are initplans at some query > level below the current query level as those plans could be > parallel-unsafe or undirect correlated plans. I would like to mention different test scenarios with InitPlans that we've considered while developing and testing of the patch. An InitPlan is a subselect that doesn't take any reference from its immediate outer query level and it returns a param value. For example, consider the following query: QUERY PLAN ------------------------------ Seq Scan on t1 Filter: (k = $0) allParams: $0 InitPlan 1 (returns $0) -> Aggregate -> Seq Scan on t3 In this case, the InitPlan is evaluated once when the filter is checked for the first time. For subsequent checks, we need not evaluate the initplan again since we already have the value. In our approach, we parallelize the sequential scan by inserting a Gather node on top of parallel sequential scan node. At the Gather node, we evaluate the InitPlan before spawning the workers and pass this value to the worker using dynamic shared memory. This yields the following plan: QUERY PLAN --------------------------------------------------- Gather Workers Planned: 2 Params Evaluated: $0 InitPlan 1 (returns $0) -> Aggregate -> Seq Scan on t3 -> Parallel Seq Scan on t1 Filter: (k = $0) As Amit mentioned up in the thread, at a Gather node, we evaluate only those InitPlans that are attached to this query level or any higher one and are used under the Gather node. extParam at a node includes the InitPlan params that should be passed from an outer node. I've attached a patch to show extParams and allParams for each node. Here is the output with that patch: QUERY PLAN --------------------------------------------------- Gather Workers Planned: 2 Params Evaluated: $0 allParams: $0 InitPlan 1 (returns $0) -> Finalize Aggregate -> Gather Workers Planned: 2 -> Partial Aggregate -> Parallel Seq Scan on t3 -> Parallel Seq Scan on t1 Filter: (k = $0) allParams: $0 extParams: $0 In this case, $0 is included in extParam of parallel sequential scan and the InitPlan corresponding to this param is attached to the same query level that contains the Gather node. Hence, we evaluate $0 at Gather and pass it to workers. But, for generating a plan like this requires marking an InitPlan param as parallel_safe. We can't mark all params as parallel_safe because of correlated subselects. Hence, in max_parallel_hazard_walker, the only params marked safe are InitPlan params from current or outer query level. An InitPlan param from inner query level isn't marked safe since we can't evaluate this param at any Gather node above the current node(where the param is used). As mentioned by Amit, we also don't allow generation of gather path if there are InitPlans at some query level below the current query level as those plans could be parallel-unsafe or undirect correlated plans. I've attached a script file and its output containing several scenarios relevant to InitPlans. I've also attached the patch for displaying extParam and allParam at each node. This patch can be applied on top of pq_pushdown_initplan_v3.patch. Please find the attachments. > This restricts some of the cases for parallelism like when initplans > are below gather node, but the patch looks better. We can open up > those cases if required in a separate patch. +1. Unfortunately, this patch doesn't enable parallelism for all possible cases with InitPlans. Our objective is to keep things simple and clean. Still, TPC-H q22 runs 2.5~3 times faster with this patch. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: