Re: short-cutting if sum()>constant - Mailing list pgsql-sql
From | Pavel Stehule |
---|---|
Subject | Re: short-cutting if sum()>constant |
Date | |
Msg-id | 162867790912230535m6c84a1cfodea1c269d3d84301@mail.gmail.com Whole thread Raw |
In response to | Re: short-cutting if sum()>constant (Ivan Sergio Borgonovo <mail@webthatworks.it>) |
Responses |
Re: short-cutting if sum()>constant
|
List | pgsql-sql |
2009/12/23 Ivan Sergio Borgonovo <mail@webthatworks.it>: > On Wed, 23 Dec 2009 12:52:38 +0100 > Pavel Stehule <pavel.stehule@gmail.com> wrote: > >> The problem is massive cycle. Plpgsql really isn't language for >> this. This interpret missing own arithmetic unit. So every >> expression is translated to SELECT statement >> >> IF a > c ~ SELECT a > c >> a = a + 1 ~ SELECT a + 1 > >> these queries are executed in some special mode, but still it is >> more expensive than C a = a + 1 > > I didn't get it. > I'd expect that since plpgsql should shere SQL data types it could > map easily something like > > if (a>b) then > or > a := a +b > > and something like a for in query loop was going to be highly > optimised as well. > plpgsql should be the most tightly bounded language to the internals > of pg SQL. no there are not any internal optimisation. PostgreSQL can optimize only execution plan - not individual expressions. > >> > select sum(a) from data; takes 1999.492 ms. >> > select count(*) from data; takes 1612.039 ms > >> it is slower, because there is one NULL test more. > > That didn't came as a surprise. It was there for comparison. > >> PLpgSQL quietly uses cursors for FOR SELECT, your plperl uses > > I didn't notice. That was my first plperl function in my life. > Anyway that means that that test didn't say anything about > interpreter speed and duck typing. you don't compare equal things. > >> > Anyway I'm not able to justify the difference in speed between >> > plpgsql and your solution in such case unless plpgsql is >> > switching back and forward between binary data and their text >> > representation. It would be nice if someone that know plpgsql >> > internals explain where the cost comes from. > >> I thing so most expensive part in your plpgsql code is two >> expression. > one expression: if >> loop_var > parameter << second expression: loop_var + 1 > I didn't understand. > > What's the reason the interpreter can't translate the if and the b := > row._a + b; into something that very resemble compiled C? a) simplicity. PLpgSQL interpret is very simple. b) mostly types are little bit different behave than natural C types - domains are different, C types doesn't know NULL value, ... so if you like maximum speed, then you can use C language. It is relative simple, much simpler than C procedures in T-SQL or Oracle. > plpgsql is not doing anything different than: > select count(*) from (select case when a>3 then 1 else 2 end, a+a > from data limit 9000000) as f; no - it doesn't any intelligence - it doesn't join expression together. It does exactly it, what you write. see http://okbob.blogspot.com/2009/12/how-dont-write-plpgsql-functions.html > One of the advantages of using plpgsql (and one annoyance too) is > that variables are declared and plpgsql should know how to operate > on them with native C code. It little bit nonsense. On 99% plpgsql use SPI api and work with variables via query interface. PLpgSQL can do loop statement, if statement, but for all others use internal query based API. > > BTW the best performer considering all constraints and data > distribution seems to be the simplest one: > > select sum(s.a) from (select a from data where a>0 limit 9000000) s; > Time: 2620.677 ms > this query is little but different, than you original request, but it could work for you. Regard Pavel Stehule > We've no negative int... and they are int, so they have to be no > less than 1 if they are not 0. If I know there are no 0, the > simplest version become even faster. > > When I randomly pick up values in [1,3] plpgsql and generate_series > start to perform similarly but still the simplest version is leading. > > When the interval is in [1,6] the plpgsql becomes faster than the > generate_series but the simplest version is still leading. > > Just when the interval is in [1,10] the plpgsql version and the > simplest one becomes comparable. > > -- > Ivan Sergio Borgonovo > http://www.webthatworks.it > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >