Home > mailing lists

Re: stddev returns 0 when there is one row - Mailing list pgsql-general

From	Murthy Kambhampaty
Subject	Re: stddev returns 0 when there is one row
Date	April 21, 2003 12:05:10
Msg-id	2D92FEBFD3BE1346A6C397223A8DD3FC0921F7@THOR.goeci.com Whole thread Raw
In response to	stddev returns 0 when there is one row (Bruno Wolff III <bruno@wolff.to>)
Responses	Re: stddev returns 0 when there is one row
List	pgsql-general

Tree view

If we could step back for second and look at why you would want to calculate
a standard deviation in the first place -- i.e., to measure the "spread", or
"dispersion", in the measure represented by a given field, we might get a
better answer than looking at the specs. for somebody else' software.

If you were looking at a table of sample data, and you had a sample size of
one, then the result should, of course be null, since a sample size of one
gives you no information about the variance and standard deviation (standard
error, really) of the distribution from which the sample was drawn.

But we are talking about a relational database here, where it is common to
throw away redundant information, and multiple draws from a deterministic
process, with a variance and standard deviation of zero, are redundant (did
you ever do a "select distinct from <>"?). So, it is entirely likely that
data stored in postgresql will only have include row for data drawn from a
"distribution with zero standard deviation".

The STDEV_POP and STDEV_SMPL convention is not strictly right for the
context, since we throw away "redundant" records regardless of whether they
represent a sample or a population; so it's no surprise that postgresql is
programmed the way it is: if your single record may not really represent a
single draw on a distribution, so it's the user's job to program your SQL to
return NULL for std. dev. and variance.

Of course, multiple identical records aren't redundant when they describe
statistical data, so you should be keeping a count() column around to record
the "frequency" with which each distinct set of values occurs, and using the
variance and standard deviation formulas for "grouped data" - e.g., ( sum(
X**2 * freq) - sum( X *freq)*sum( X*freq))/(sum( freq) -1) for the sample
variance, where the column "X" records the data values, and the column
"freq" records the number of times the values is repeated in the sample.
Tom, would it be too much trouble to program these in (as STDEV_POP_G and
STDEV_SMPL_G or some variant?

Thanks,
    Murthy


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Saturday, April 19, 2003 16:33
To: Joe Conway
Cc: Manfred Koizar; Bruno Wolff III; pgsql-general@postgresql.org
Subject: Re: [GENERAL] stddev returns 0 when there is one row


Joe Conway <mail@joeconway.com> writes:
> So I'd take it that PostgreSQL's STDDEV implements STDDEV_POP.

No, we implement the sample standard deviation, as stated in the docs:
http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/functions-aggreg
ate.html

The code is pretty straightforward, at least in the float8 case:

    /* We define STDDEV of no values to be NULL, of 1 value to be 0 */
    if (N == 0.0)
        PG_RETURN_NULL();

    if (N <= 1.0)
        PG_RETURN_FLOAT8(0.0);

    numerator = N * sumX2 - sumX * sumX;

    /* Watch out for roundoff error producing a negative numerator */
    if (numerator <= 0.0)
        PG_RETURN_FLOAT8(0.0);

    PG_RETURN_FLOAT8(sqrt(numerator / (N * (N - 1.0))));

I don't have a real strong feeling about whether we should change the
behavior at N=1 or not.  Does the SQL200x spec provide any guidance?

            regards, tom lane


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

pgsql-general by date:

From: Joe Conway
Date: 21 April 2003, 12:02:08
Subject: Re: Using TEXT columns for binary content

From: Lamar Owen
Date: 21 April 2003, 12:18:51
Subject: Re: Using TEXT columns for binary content

Re: stddev returns 0 when there is one row - Mailing list pgsql-general

Previous

Next