On Wed, 3 Dec 2025 at 22:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Attached is a fleshed-out patch proposal that fixes the related
> aggregates and adds test cases.
>
Looking at float8_regr_accum(), I think it would be preferable to
arrange for it to leave Sxx, Syy, and Sxy zero until distinct X and Y
values are seen. I.e., something like this:
if (newvalX != commonX || isnan(newvalX))
commonX = get_float8_nan();
if (newvalY != commonY || isnan(newvalY))
commonY = get_float8_nan();
if (isnan(commonX) || isnan(commonY))
{
tmpX = newvalX * N - Sx;
tmpY = newvalY * N - Sy;
scale = 1.0 / (N * transvalues[0]);
if (isnan(commonX))
Sxx += tmpX * tmpX * scale;
if (isnan(commonY))
Syy += tmpY * tmpY * scale;
if (isnan(commonX) && isnan(commonY))
Sxy += tmpX * tmpY * scale;
... Overflow check ...
}
This would mean that float8_corr(), float8_regr_r2(),
float8_regr_slope(), and float8_regr_intercept() would not need to
look at commonX or commonY, and could simply rely on Sxx == 0 or Syy
== 0 to detect horizontal and vertical lines.
Aside from making the code simpler, this would guarantee that the
aggregate functions regr_sxx() and regr_syy() would return exactly
zero for all-constant X and Y inputs respectively, and that
regr_sxy(), covar_pop(), and covar_samp() would return exactly zero if
either the X or the Y inputs were all constant.
Something else that occurred to me was that float8_regr_avgx() and
float8_regr_avgy() might as well make use of commonX and commonY,
since we're calculating them, so they would return exact averages if
all the X or Y values were the same, rather than results with possible
rounding errors.
I also wonder if it would be worth doing something similar for the
single-variable aggregates so that var_pop(), var_samp(),
stddev_pop(), and stddev_samp() would all return exactly zero, and
avg() would return the exact common value, if all the inputs were
constant.
Regards,
Dean