Volatile Functions in Parallel Plans - Mailing list pgsql-hackers
From | Zhenghua Lyu |
---|---|
Subject | Volatile Functions in Parallel Plans |
Date | |
Msg-id | SN6PR05MB4559835263BADF54FA370163B57E0@SN6PR05MB4559.namprd05.prod.outlook.com Whole thread Raw |
Responses |
Re: Volatile Functions in Parallel Plans
|
List | pgsql-hackers |
Hi,
I test some SQL in the latest Postgres master branch code (we find these issues when
developing Greenplum database in the PR https://github.com/greenplum-db/gpdb/pull/10418,
and my colleague come up with the following cases in Postgres):
create table t3 (c1 text, c2 text);CREATE TABLEinsert into t3select'fhufehwiohewiuewhuhwiufhwifhweuhfwu', --random data'fiowehufwhfyegygfewpfwwfeuhwhufwh' --random datafrom generate_series(1, 10000000) i;INSERT 0 10000000analyze t3;ANALYZEcreate table t4 (like t3);CREATE TABLEinsert into t4 select * from t4;INSERT 0 0insert into t4 select * from t3;INSERT 0 10000000analyze t4;ANALYZEset enable_hashjoin to off;SETexplain (costs off)select count(*) from t3, t4where t3.c1 like '%sss'and timeofday() = t4.c1 and t3.c1 = t4.c1;QUERY PLAN--------------------------------------------------------Finalize Aggregate-> GatherWorkers Planned: 2-> Partial Aggregate-> Nested LoopJoin Filter: (t3.c1 = t4.c1)-> Parallel Seq Scan on t3Filter: (c1 ~~ '%sss'::text)-> Seq Scan on t4Filter: (timeofday() = c1)(10 rows)explain (verbose, costs off)select count(*)fromt3,(select *, timeofday() as x from t4 ) t4where t3.c1 like '%sss' andtimeofday() = t4.c1 and t3.c1 = t4.c1;QUERY PLAN------------------------------------------------------------------Finalize AggregateOutput: count(*)-> GatherOutput: (PARTIAL count(*))Workers Planned: 2-> Partial AggregateOutput: PARTIAL count(*)-> Nested LoopJoin Filter: (t3.c1 = t4.c1)-> Parallel Seq Scan on public.t3Output: t3.c1, t3.c2Filter: (t3.c1 ~~ '%sss'::text)-> Seq Scan on public.t4Output: t4.c1, NULL::text, timeofday()Filter: (timeofday() = t4.c1)(15 rows)
Focus on the last two plans, the function timeofday is
volatile but paralle-safe. And Postgres outputs two parallel
plan.
The first plan:
The join's left tree is parallel scan and the right tree is seq scan.Finalize Aggregate-> GatherWorkers Planned: 2-> Partial Aggregate-> Nested LoopJoin Filter: (t3.c1 = t4.c1)-> Parallel Seq Scan on t3Filter: (c1 ~~ '%sss'::text)-> Seq Scan on t4Filter: (timeofday() = c1)
This algorithm is correct using the distribute distributive law of
distributed join:
A = [A1 A2 A3...An], B then A join B = gather( (A1 join B) (A2 join B) ... (An join B) )
The correctness of the above law should have a pre-assumption:
The data set of B is the same in each join: (A1 join B) (A2 join B) ... (An join B)
But things get complicated when volatile functions come in. Timeofday is just
an example to show the idea. The core is volatile functions can return different
results on successive calls with the same arguments. Thus the following piece,
the right tree of the join
-> Seq Scan on t4
Filter: (timeofday() = c1)
can not be considered consistent everywhere in the scan workers.
The second plan
Finalize AggregateOutput: count(*)-> GatherOutput: (PARTIAL count(*))Workers Planned: 2-> Partial AggregateOutput: PARTIAL count(*)-> Nested LoopJoin Filter: (t3.c1 = t4.c1)-> Parallel Seq Scan on public.t3Output: t3.c1, t3.c2Filter: (t3.c1 ~~ '%sss'::text)-> Seq Scan on public.t4Output: t4.c1, NULL::text, timeofday()Filter: (timeofday() = t4.c1)
-> Seq Scan on public.t4
Output: t4.c1, NULL::text, timeofday()
Filter: (timeofday() = t4.c1)
It should not be taken as consistent in different workers.
------------------------------------------------------------------------------------------
The above are just two cases we find today. And it should be enough to
show the core issue to have a discussion here.
The question is, should we consider volatile functions when generating
parallel plans?
------------------------------------------------------------------------------------------
FYI, some plan diffs of Greenplum can be found here: https://www.diffnow.com/report/etulf
pgsql-hackers by date: