Home > mailing lists

Re: Built-in binning functions - Mailing list pgsql-hackers

From	Pavel Stehule
Subject	Re: Built-in binning functions
Date	July 16, 2014 19:36:21
Msg-id	CAFj8pRBqK56EpOy-9Ba=EUBo2nimpToUCVZVgcmW52yfyNO9dQ@mail.gmail.com Whole thread Raw
In response to	Re: Built-in binning functions (Petr Jelinek <petr@2ndquadrant.com>)
Responses	Re: Built-in binning functions Re: Built-in binning functions
List	pgsql-hackers

Tree view

2014-07-16 10:04 GMT+02:00 Petr Jelinek <petr@2ndquadrant.com>:

On 08/07/14 02:14, Tom Lane wrote:
Petr Jelinek <petr@2ndquadrant.com> writes:
here is a patch implementing varwidth_bucket (naming is up for
discussion) function which does binning with variable bucket width.

I didn't see any discussion of the naming question in this thread.
I'd like to propose that it should be just "width_bucket()"; we can
easily determine which function is meant, considering that the
SQL-spec variants don't take arrays and don't even have the same
number of actual arguments.

I did mention in submission that the names are up for discussion, I am all for naming it just width_bucket.

I had this idea too - but I am not sure if it is good idea. A distance between ANSI SQL with_bucket and our enhancing is larger than in our implementation of "median" for example.

I can live with both names, but current name I prefer.

So given plain integer arguments, we'll invoke the float8 version not the
int8 version, which may be undesirable. The same for int2 arguments.
We could fix that by adding bespoke int4 and maybe int2 variants, but

Hmm, yeah I don't love the idea of having 3 functions just to cover integer datatypes :(.

TBH, I'm not sure that the specific-type functions are worth the trouble.
Maybe we should just have one generic function, and take the trouble to
optimize its array-subscripting calculations for either fixed-length or
variable-length array elements? Have you got performance measurements
demonstrating that multiple implementations really buy enough to justify
the extra code?

The performance difference is about 20% (+/- few depending on the array size), I don't know if that's bad enough to warrant type-specific implementation. I personally don't know how to make the generic implementation much faster than it is now, except maybe by turning it into aggregate which would cache the deconstructed version of the array, but that change semantics quite a bit and is probably not all that desirable.

I am not sure if our API is enough to do it - there are no any simple support for immutable parameters.

The performance is one point. Second point is wrong result, when input array is not well sorted. But this check means next performance penalization. But we cannot do it.

Also, I'm not convinced by this business of throwing an error for a
NULL array element. Per spec, null arguments to width_bucket()
produce a null result, not an error --- shouldn't this flavor act
similarly? In any case I think the test needs to use
array_contains_nulls() not just ARR_HASNULL.

I am not against returning NULL for NULL array, I would still like to error on array that has values + NULL somewhere in the middle though.

Pavel

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

From: Andres Freund
Date: 16 July 2014, 18:53:39
Subject: Re: [GSoC2014] Patch ALTER TABLE ... SET LOGGED

From: Keith Fiske
Date: 16 July 2014, 19:47:54
Subject: Allow GRANT TRIGGER privilege to DROP TRIGGER (Re: Bug ##7716)

Re: Built-in binning functions - Mailing list pgsql-hackers

Previous

Next