Re: BUG #19340: Wrong result from CORR() function - Mailing list pgsql-bugs

From Dean Rasheed
Subject Re: BUG #19340: Wrong result from CORR() function
Date
Msg-id CAEZATCU8=jjP1+dSSos6oXVzToGu-Rofyhv4BBpU31LN9Bo17Q@mail.gmail.com
Whole thread Raw
In response to Re: BUG #19340: Wrong result from CORR() function  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #19340: Wrong result from CORR() function
List pgsql-bugs
On Wed, 3 Dec 2025 at 22:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Attached is a fleshed-out patch proposal that fixes the related
> aggregates and adds test cases.
>

Looking at float8_regr_accum(), I think it would be preferable to
arrange for it to leave Sxx, Syy, and Sxy zero until distinct X and Y
values are seen. I.e., something like this:

    if (newvalX != commonX || isnan(newvalX))
        commonX = get_float8_nan();
    if (newvalY != commonY || isnan(newvalY))
        commonY = get_float8_nan();

    if (isnan(commonX) || isnan(commonY))
    {
        tmpX = newvalX * N - Sx;
        tmpY = newvalY * N - Sy;
        scale = 1.0 / (N * transvalues[0]);
        if (isnan(commonX))
            Sxx += tmpX * tmpX * scale;
        if (isnan(commonY))
            Syy += tmpY * tmpY * scale;
        if (isnan(commonX) && isnan(commonY))
            Sxy += tmpX * tmpY * scale;

        ... Overflow check ...
    }

This would mean that float8_corr(), float8_regr_r2(),
float8_regr_slope(), and float8_regr_intercept() would not need to
look at commonX or commonY, and could simply rely on Sxx == 0 or Syy
== 0 to detect horizontal and vertical lines.

Aside from making the code simpler, this would guarantee that the
aggregate functions regr_sxx() and regr_syy() would return exactly
zero for all-constant X and Y inputs respectively, and that
regr_sxy(), covar_pop(), and covar_samp() would return exactly zero if
either the X or the Y inputs were all constant.

Something else that occurred to me was that float8_regr_avgx() and
float8_regr_avgy() might as well make use of commonX and commonY,
since we're calculating them, so they would return exact averages if
all the X or Y values were the same, rather than results with possible
rounding errors.

I also wonder if it would be worth doing something similar for the
single-variable aggregates so that var_pop(), var_samp(),
stddev_pop(), and stddev_samp() would all return exactly zero, and
avg() would return the exact common value, if all the inputs were
constant.

Regards,
Dean



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #19345: MemoryContextSizeFailure after upgrade 14.11 to 17.7 in stored procedure
Next
From: Dean Rasheed
Date:
Subject: Re: BUG #19340: Wrong result from CORR() function