Home > mailing lists

Re: short-cutting if sum()>constant - Mailing list pgsql-sql

From	Pavel Stehule
Subject	Re: short-cutting if sum()>constant
Date	December 23, 2009 09:35:37
Msg-id	162867790912230535m6c84a1cfodea1c269d3d84301@mail.gmail.com Whole thread Raw
In response to	Re: short-cutting if sum()>constant (Ivan Sergio Borgonovo <mail@webthatworks.it>)
Responses	Re: short-cutting if sum()>constant
List	pgsql-sql

Tree view

2009/12/23 Ivan Sergio Borgonovo <mail@webthatworks.it>:
> On Wed, 23 Dec 2009 12:52:38 +0100
> Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>> The problem is massive cycle. Plpgsql really isn't language for
>> this. This interpret missing own arithmetic unit. So every
>> expression is translated to SELECT statement
>>
>> IF a > c ~ SELECT a > c
>> a = a + 1 ~ SELECT a + 1
>
>> these queries are executed in some special mode, but still it is
>> more expensive than C a = a + 1
>
> I didn't get it.
> I'd expect that since plpgsql should shere SQL data types it could
> map easily something like
>
> if (a>b) then
> or
> a := a +b
>
> and something like a for in query loop was going to be highly
> optimised as well.
> plpgsql should be the most tightly bounded language to the internals
> of pg SQL.

no there are not any internal optimisation. PostgreSQL can optimize
only execution plan - not individual expressions.

>
>> > select sum(a) from data; takes 1999.492 ms.
>> > select count(*) from data; takes 1612.039 ms
>
>> it is slower, because there is one NULL test more.
>
> That didn't came as a surprise. It was there for comparison.
>
>> PLpgSQL quietly uses cursors for FOR SELECT, your plperl uses
>
> I didn't notice. That was my first plperl function in my life.
> Anyway that means that that test didn't say anything about
> interpreter speed and duck typing.

you don't compare equal things.

>
>> > Anyway I'm not able to justify the difference in speed between
>> > plpgsql and your solution in such case unless plpgsql is
>> > switching back and forward between binary data and their text
>> > representation. It would be nice if someone that know plpgsql
>> > internals explain where the cost comes from.
>
>> I thing so most expensive part in your plpgsql code is two
>> expression.
>

one expression: if >> loop_var > parameter <<
second expression: loop_var + 1

> I didn't understand.
>
> What's the reason the interpreter can't translate the if and the b :=
> row._a + b; into something that very resemble compiled C?

a) simplicity. PLpgSQL interpret is very simple.
b) mostly types are little bit different behave than natural C types -
domains are different, C types doesn't know NULL value, ...

so if you like maximum speed, then you can use C language. It is
relative simple, much simpler than C procedures in T-SQL or Oracle.

> plpgsql is not doing anything different than:
> select count(*) from (select case when a>3 then 1 else 2 end, a+a
> from data limit 9000000) as f;

no - it doesn't any intelligence - it doesn't join expression
together. It does exactly it, what you write.

see http://okbob.blogspot.com/2009/12/how-dont-write-plpgsql-functions.html


> One of the advantages of using plpgsql (and one annoyance too) is
> that variables are declared and plpgsql should know how to operate
> on them with native C code.

It little bit nonsense. On 99% plpgsql use SPI api and work with
variables via query interface. PLpgSQL can do loop statement, if
statement, but for all others use internal query based API.

>
> BTW the best performer considering all constraints and data
> distribution seems to be the simplest one:
>
> select sum(s.a) from (select a from data where a>0 limit 9000000) s;
> Time: 2620.677 ms
>

this query is little but different, than you original request, but it
could work for you.

Regard
Pavel Stehule

> We've no negative int... and they are int, so they have to be no
> less than 1 if they are not 0. If I know there are no 0, the
> simplest version become even faster.
>
> When I randomly pick up values in [1,3] plpgsql and generate_series
> start to perform similarly but still the simplest version is leading.
>
> When the interval is in [1,6] the plpgsql becomes faster than the
> generate_series but the simplest version is still leading.
>
> Just when the interval is in [1,10] the plpgsql version and the
> simplest one becomes comparable.
>
> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>

pgsql-sql by date:

From: Ivan Sergio Borgonovo
Date: 23 December 2009, 09:01:34
Subject: Re: short-cutting if sum()>constant

From: Ivan Sergio Borgonovo
Date: 23 December 2009, 10:27:46
Subject: Re: short-cutting if sum()>constant

Re: short-cutting if sum()>constant - Mailing list pgsql-sql

Previous

Next