Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions - Mailing list pgsql-hackers
From | Rafia Sabih |
---|---|
Subject | Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions |
Date | |
Msg-id | CAOGQiiMPSx=zBP2h+9jZvEaV3bYqMmKt-phST7wT1WrtZ_6o4A@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions
|
List | pgsql-hackers |
On Wed, Mar 15, 2017 at 8:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Note this: > > if (completed || !fcache->returnsSet) > postquel_end(es); > > When the SQL function doesn't return a set, then we can allow > parallelism even when lazyEval is set, because we'll only call > ExecutorStart() once. But my impression is that something like this: Well, when I test following SQL function I see it cannot be parallelised because lazyEval is true for it though it is not returning set, CREATE OR REPLACE FUNCTION not_parallel() RETURNS bigint AS $$ BEGIN SELECT count(distinct i) FROM t WHERE j = 12; END; $$ LANGUAGE sql; Query Text:SELECT count(distinct i) FROM t WHERE j = 12; Aggregate (cost=34.02..34.02 rows=1 width=8) (actual time=0.523..0.523 rows=1 loops=1)-> Seq Scan on t (cost=0.00..34.01 rows=1 width=4) (actual time=0.493..0.493 rows=0 loops=1) Filter: (j = 12) Rows Removed by Filter: 2001 2017-03-21 15:24:03.378 IST [117823] CONTEXT: SQL function "already_parallel" statement 1 2017-03-21 15:24:03.378 IST [117823] LOG: duration: 94868.181 ms plan: Query Text: select already_parallel(); Result (cost=0.00..0.26 rows=1 width=8) (actual time=87981.047..87981.048 rows=1 loops=1)already_parallel ------------------ 0 (1 row) As far as my understanding goes for this case, lazyEvalOk is set irrespective of whether the function returns set or not in fmgr_sql, else { randomAccess = false; lazyEvalOK = true; } then it is passed to init_sql_fcache which is then passed to init_execution_state where cache->lazyEval is set, if (lasttages && fcache->junkFilter) { lasttages->setsResult = true; if (lazyEvalOK && lasttages->stmt->commandType == CMD_SELECT && !lasttages->stmt->hasModifyingCTE) fcache->lazyEval = lasttages->lazyEval = true; } Finally, this lazyEval is passed to ExecutorRun in postquel_getnext that restricts parallelism by setting execute_once = 0, /* Run regular commands to completion unless lazyEval */ uint64 count = (es->lazyEval) ? 1 : 0; ExecutorRun(es->qd, ForwardScanDirection, count, !es->lazyEval); So, this is my concern that why is such a query should not execute in parallel when in SQL function. If I run this same query from PLpgsql function then it can run in parallel, CREATE OR REPLACE FUNCTION not_parallel() RETURNS bigint AS $$ declare cnt int:=0; BEGIN SELECT count(distinct i) into cnt FROM t WHERE j = 12; RETURN cnt; END; $$ LANGUAGE plpgsql; select not_parallel(); 2017-03-21 15:28:56.282 IST [123086] LOG: duration: 0.003 ms plan: Query Text: SELECT count(distinct i) FROM t WHERE j = 12 Parallel Seq Scan on t (cost=0.00..19.42 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=1)Filter: (j = 12) 2017-03-21 15:28:56.282 IST [123087] LOG: duration: 0.003 ms plan: Query Text: SELECT count(distinct i) FROM t WHERE j = 12 Parallel Seq Scan on t (cost=0.00..19.42 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=1)Filter: (j = 12) 2017-03-21 15:28:57.530 IST [117823] LOG: duration: 1745.372 ms plan: Query Text: SELECT count(distinct i) FROM t WHERE j = 12 Aggregate (cost=19.42..19.43 rows=1 width=8) (actual time=1255.743..1255.743 rows=1 loops=1)-> Gather (cost=0.00..19.42 rows=1 width=4) (actual time=1255.700..1255.700 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on t (cost=0.00..19.42 rows=1 width=4) (actual time=418.443..418.443 rows=0 loops=3) Filter: (j = 12) Rows Removed by Filter: 667 2017-03-21 15:28:57.530 IST [117823] CONTEXT: SQL statement "SELECT count(distinct i) FROM t WHERE j = 12" PL/pgSQL function not_parallel() line 4 at SQL statement 2017-03-21 15:28:57.531 IST [117823] LOG: duration: 2584.282 ms plan: Query Text: select not_parallel(); Result (cost=0.00..0.26 rows=1 width=8) (actual time=2144.315..2144.316 rows=1 loops=1)not_parallel -------------- 0 (1 row) Hence, it appears lazyEval is the main reason behind it and it should be definitely fixed in my opinion. Please enlighten me with your comments/opinions. Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/
pgsql-hackers by date: