parallel_safe - Mailing list pgsql-hackers
From | Andy Fan |
---|---|
Subject | parallel_safe |
Date | |
Msg-id | 87ecwisbk7.fsf@163.com Whole thread Raw |
List | pgsql-hackers |
Hi, In the comments of add_partial_path, we have: * We don't generate parameterized partial paths for several reasons. Most * importantly, they're not safe to execute, because there's nothing to * make sure that a parallel scan within the parameterized portion of the * plan is running with the same value in every worker at the same time. and we are using 'is_parallel_safe(PlannerInfo *root, Node *node)' to see if it is safe/necessary to generate partial path on a RelOptInfo. In the code of 'is_parallel_safe': /* * We can't pass Params to workers at the moment either, so they are also * parallel-restricted, unless they are PARAM_EXTERN Params or are * PARAM_EXEC Params listed in safe_param_ids... */ else if (IsA(node, Param)) { Param *param = (Param *) node; if (param->paramkind == PARAM_EXTERN) return false; if (param->paramkind != PARAM_EXEC || !list_member_int(context->safe_param_ids, param->paramid)) { if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) return true; } return false; /* nothing to recurse to */ } Then see the below example: create table bigt (a int, b int, c int); insert into bigt select i, i, i from generate_series(1, 1000000)i; analyze bigt; select * from bigt o where b = 1; QUERY PLAN ----------------------------------- Gather Workers Planned: 2 -> Parallel Seq Scan on bigt o Filter: (b = 1) (4 rows) select * from bigt o where b = 1 and c = (select sum(c) from bigt i where c = o.c); QUERY PLAN ------------------------------------------- Seq Scan on bigt o Filter: ((b = 1) AND (c = (SubPlan 1))) SubPlan 1 -> Aggregate -> Seq Scan on bigt i Filter: (c = o.c) (6 rows) I think the below plan should be correct and more efficiently. Plan 1: QUERY PLAN ------------------------------------------------- Gather Workers Planned: 2 -> Parallel Seq Scan on bigt o Filter: ((b = 1) AND (c = (SubPlan 1))) SubPlan 1 -> Aggregate -> Seq Scan on bigt Filter: (c = o.c) (8 rows) However the above plan is impossible because: (1). During the planning of the SubPlan, we use is_parallel_safe() to set the "bigt i"'s consider_parallel to false because of the above "PARAM_EXEC" reason. (2). The parallel_safe of the final SubPlan is set to false due to rel->consider_parallel. (3). During the planning of "bigt o", it calls is_parallel_safe and then it find a subplan->parallel_safe == false, then all the partial path is impossible. is_parallel_safe/max_parallel_hazard_walker: else if (IsA(node, SubPlan)) { SubPlan *subplan = (SubPlan *) node; List *save_safe_param_ids; if (!subplan->parallel_safe && max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) return true; ... } So if we think "plan 1" is valid, then what is wrong? I think it is better to think about what parallel_safe is designed for. In Path: /* OK to use as part of parallel plan? */ bool parallel_safe; The definition looks to say: the Path/Plan should not be run as a 'parallel_aware' plan, but the code looks to say: The Path/Plan should not be run in a parallel worker even it is *not* parallel_aware. The reason I feel the above is because: * We don't generate parameterized partial paths for several reasons. Most * importantly, they're not safe to execute, because there's nothing to * make sure that a parallel scan within the parameterized portion of the * plan is running with the same value in every worker at the same time. If a plan which is not parallel-aware, why should we care about the above stuff? In the current code, there are some other parallel_safe = false which look like a *should not be run in a parallel worker rather than parallel plan*. the cases I know are: 1. path->parallel_safe = false in Gather/GatherMerge. 2. some expressions which is clearly claried as parallel unsafe. So parallel_safe looks have two different meaning to me. are you feeling something similar? Do you think treating the different parallel_safe would make parallel works in some more places? Do you think the benefits would be beyond the SubPlan one (I can't make a example beside SubPlan so far). -- Best Regards Andy Fan
pgsql-hackers by date: