Re: Weighted Stats - Mailing list pgsql-hackers
From | David Fetter |
---|---|
Subject | Re: Weighted Stats |
Date | |
Msg-id | 20160408204756.GA15407@fetter.org Whole thread Raw |
In response to | Re: Weighted Stats (David Fetter <david@fetter.org>) |
Responses |
Re: Weighted Stats
|
List | pgsql-hackers |
On Sun, Mar 20, 2016 at 03:38:40PM -0700, David Fetter wrote: > On Sat, Mar 19, 2016 at 05:04:08PM +0100, Tomas Vondra wrote: > > Hi, > > > > On 03/19/2016 07:34 AM, David Fetter wrote: > > >On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote: > > >>On Tue, Mar 15, 2016 at 8:36 AM, David Fetter <david@fetter.org> wrote: > > >>> > > >>>Please find attached a patch that uses the float8 version to cover the > > >>>numeric types. > > >> > > >>Is there a well-defined meaning for having a negative weight? If no, > > >>should it be disallowed? > > > > > >Opinions on this appear to vary. A Wikipedia article defines weights > > >as non-negative, while a manual to which it refers only uses non-zero. > > > > > >https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Mathematical_definition > > >https://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html > > > > I don't think that actually allows negative weights. It says that > > > > w_i = 1/\sigma_i^2 > > > > and variance is always > 0, thus w_i > 0. The zero is used as a special flag > > to remove the sample from the data set in a simple way. > > > > >I'm not sure which if either would be authoritative, but I could > > >certainly make up variants for each assumption. > > > > > >The assumption they have in common about weights is that a zero > > >weight is not part of the calculation, which assumption is > > >implemented in the previously submitted code. > > > > I think that if we're not sure what should happen with negative weights, > > then we should disallow them. It's easy to allow them later once we have a > > reasonable definition, but if we allow them now and later realize it should > > behave differently, we'll be in trouble because of breaking existing uses. > > OK > > > I can't really come up with a reasonable example that would actually use > > negative weights. Can you? That would probably help with defining the > > behavior correctly. > > No, but I'm not a statistician. I've seen them mentioned in contexts > that appear to be discussions among same, and again opinions vary. > > > Allowing negative weights has other consequences. For example, what if > > sum(W) ends up being 0? For example > > > > CREATE TABLE t (a float, b float); > > INSERT INTO t SELECT i, 1 FROM generate_series(1,1000) s(i); > > INSERT INTO t SELECT i, -1 FROM generate_series(1,1000) s(i); > > > > SELECT weighted_avg(a,b) FROM t; > > weighted_avg > > -------------- > > NaN > > (1 row) > > > > Is that the correct behavior? Why? > > It's not, and you're right. > > I will send a patch that disallows negative weights this evening or > tomorrow. It will be slightly more complicated as I believe I will > need to create a new accumulator function for the weighted_avg() case > where I had been using an extant one before. > > Cheers, > David. Sorry about the delay. This patch disallows negative weights, although it still has that odd difference Jeff found. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Attachment
pgsql-hackers by date: