Thread: Horrible/never returning performance using stable function on WHERE clause
Horrible/never returning performance using stable function on WHERE clause
From
Achilleas Mantzios
Date:
Hello list, I have written some functions to extract some data from our DB, from an hierarchical structure, the problem is that if afunction doing lookups is defined as STABLE in the WHERE clause the performance is horrible. What I am trying to achieve is given a specific node in an hierarchical structure (a machine definition) tofind all its equivalent sister nodes and then for a specific instance of this hierarchy to find the max RH (running hours) among all sister nodes.I am using some functions/opers from intarray. Here arethe functions : Compares two nodes for sister property: CREATE OR REPLACE FUNCTION public.is_defid_sister_node(vdefid1 integer, vdefid2 integer) RETURNS boolean LANGUAGE plpgsql STABLE AS $function$DECLARE vparents1 INTEGER[]; vparents2 INTEGER[]; descr1 TEXT; descr2 TEXT; i INTEGER; BEGIN SELECT COALESCE(partid,0)||'__'||regexp_replace(coalesce(description,''),'[nN][oO]([0-9]+)',''),parentsinto descr1,vparents1 FROMmachdefs where defid=vdefid1; SELECT COALESCE(partid,0)||'__'||regexp_replace(coalesce(description,''),'[nN][oO]([0-9]+)',''),parentsinto descr2,vparents2 FROMmachdefs where defid=vdefid2; IF (level(vparents1) = 0 AND level(vparents2) = 0) THEN RETURN vdefid1=vdefid2; ELSIF (level(vparents1) <> level(vparents2)) THEN RETURN false; ELSE RETURN ((descr1=descr2) AND is_defid_sister_node(first(vparents1),first(vparents2))); END IF; END;$function$ Finds the set of sister nodes for a given node: CREATE OR REPLACE FUNCTION public.get_machdef_sister_defids(vdefid integer) RETURNS INTEGER[] LANGUAGE plpgsql STABLE AS $function$ DECLARE tmp INTEGER[]; BEGIN select (select array_agg(mdsis.defid) FROM machdefs mdsis WHERE mdsis.machtypeid=md.machtypeid AND level(mdsis.parents)=level(md.parents)AND last(mdsis.parents)=last(md.parents) AND is_defid_sister_node(mdsis.defid,md.defid) ) INTO tmp from machdefs md where md.defid=vdefid; IF (tmp IS NULL) THEN tmp := '{}'; END IF; RETURN tmp; END; $function$ Finds max RH for a given tree instance among all sister nodes of a given node : CREATE OR REPLACE FUNCTION public.get_machdef_sister_defids_maxrh(vvslid INTEGER,vdefid INTEGER) RETURNS INTEGER LANGUAGE plpgsql STABLE AS $function$ DECLARE tmp INTEGER; BEGIN select max(rh) into tmp from items where vslwhid=vvslid and itoar(defid) ~ get_machdef_sister_defids(vdefid); RETURN tmp; END; $function$ Query : select get_machdef_sister_defids_maxrh(479,319435); never ends (I have waited till 2-3 minutes), however, *doing the wrong thing* and declaring get_machdef_sister_defids asIMMUTABLE makes the above call return fast : # select get_machdef_sister_defids_maxrh(479,319435); get_machdef_sister_defids_maxrh --------------------------------- 10320 (1 row) Time: 110.211 ms We are using PostgreSQL 9.3. Shouldn't the optimizer use a single call to get_machdef_sister_defids in get_machdef_sister_defids_maxrh ?? Defining get_machdef_sister_defids back to STABLE and forcing get_machdef_sister_defids_maxrh to only call get_machdef_sister_defidsonce makes things work again : CREATE OR REPLACE FUNCTION public.get_machdef_sister_defids_maxrh(vvslid integer, vdefid integer) RETURNS integer LANGUAGE plpgsql STABLE AS $function$ DECLARE tmp INTEGER; tmppars INTEGER[]; BEGIN tmppars := get_machdef_sister_defids(vdefid); select max(rh) into tmp from items where vslwhid=vvslid and itoar(defid) ~ tmppars; RETURN tmp; END; $function$ # select get_machdef_sister_defids_maxrh(479,319435); get_machdef_sister_defids_maxrh --------------------------------- 10320 (1 row) Time: 111.318 ms Is this expected ? -- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
Re: Horrible/never returning performance using stable function on WHERE clause
From
David Rowley
Date:
On 29 March 2016 at 20:01, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote: > We are using PostgreSQL 9.3. Shouldn't the optimizer use a single call to > get_machdef_sister_defids in get_machdef_sister_defids_maxrh ?? It shouldn't be up to the optimizer to evaluate a STABLE function. Only IMMUTABLE functions will be evaluated during planning. What's not that clear to me is if the planner might be able to work a bit harder to create an "Initplan" for stable functions with Const arguments. Right now I can't quite see a reason why that couldn't be improved upon, after all, the documentation does claim that a STABLE function during a "single table scan it will consistently return the same result for the same argument values". However it would be quite simple just for you to force the STABLE function to be evaluated once, instead of once per row, just by modifying your query to become: select max(rh) into tmp from items where vslwhid=vvslid and itoar(defid) ~ (select get_machdef_sister_defids(vdefid)); Viewing the EXPLAIN of this, you'll notice the InitPlan, which will evaluate the function and allow the use the output value as a parameter in the main query. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Horrible/never returning performance using stable function on WHERE clause
From
Achilleas Mantzios
Date:
Hello David
On 29/03/2016 14:04, David Rowley wrote:
On 29/03/2016 14:04, David Rowley wrote:
On 29 March 2016 at 20:01, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:We are using PostgreSQL 9.3. Shouldn't the optimizer use a single call to get_machdef_sister_defids in get_machdef_sister_defids_maxrh ?? It shouldn't be up to the optimizer to evaluate a STABLE function. Only IMMUTABLE functions will be evaluated during planning.
And to add here the docs (http://www.postgresql.org/docs/9.3/static/xfunc-volatility.html) also say :What's not that clear to me is if the planner might be able to work a bit harder to create an "Initplan" for stable functions with Const arguments. Right now I can't quite see a reason why that couldn't be improved upon, after all, the documentation does claim that a STABLE function during a "single table scan it will consistently return the same result for the same argument values".
"A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement. This category allows the optimizer to optimize multiple calls of the function to a single call. In particular, it is safe to use an expression containing such a function in an index scan condition."
That's true, this worked indeed. But still cannot understand why the distinction between ~ get_machdef_sister_defids(...) and ~ (SELECT get_machdef_sister_defids(...)).However it would be quite simple just for you to force the STABLE function to be evaluated once, instead of once per row, just by modifying your query to become: select max(rh) into tmp from items where vslwhid=vvslid and itoar(defid) ~ (select get_machdef_sister_defids(vdefid)); Viewing the EXPLAIN of this, you'll notice the InitPlan, which will evaluate the function and allow the use the output value as a parameter in the main query.
Why is the planner forced in the second case and not in the first, since clearly the input argument is not dependent on any query result? (judging by the docs).
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt