Re: Review: Revise parallel pg_restore's scheduling heuristic - Mailing list pgsql-hackers
From | Sam Mason |
---|---|
Subject | Re: Review: Revise parallel pg_restore's scheduling heuristic |
Date | |
Msg-id | 20090807153307.GI5407@samason.me.uk Whole thread Raw |
In response to | Re: Review: Revise parallel pg_restore's scheduling heuristic ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Responses |
Re: Review: Revise parallel pg_restore's scheduling
heuristic
Re: Review: Revise parallel pg_restore's scheduling heuristic |
List | pgsql-hackers |
On Fri, Aug 07, 2009 at 10:19:20AM -0500, Kevin Grittner wrote: > Sam Mason <sam@samason.me.uk> wrote: > > > What do people do when testing this? I think I'd look to something > > like Student's t-test to check for statistical significance. My > > working would go something like: > > > > I assume the variance is the same because it's being tested on the > > same machine. > > > > samples = 20 > > stddev = 144.26 > > avg1 = 4783.13 > > avg2 = 4758.46 > > t = 0.54 ((avg1 - avg2) / (stddev * sqrt(2/samples))) > > > > We then have to choose how certain we want to be that they're > > actually different, 90% is a reasonably easy level to hit (i.e. one > > part in ten, with 95% being more commonly quoted). For 20 samples > > we have 19 degrees of freedom--giving us a cut-off[1] of 1.328. > > 0.54 is obviously well below this allowing us to say that there's no > > "statistical significance" between the two samples at a 90% level. > > Thanks for the link; that looks useful. To confirm that I understand > what this has established (or get a bit of help putting in in > perspective), what this says to me, in the least technical jargon I > can muster, is "With this many samples and this degree of standard > deviation, the average difference is not large enough to have a 90% > confidence level that the difference is significant." In fact, > looking at the chart, it isn't enough to reach a 75% confidence level > that the difference is significant. Significance here would seem to > mean that at least the given percentage of the time, picking this many > samples from an infinite set with an average difference that really > was this big or bigger would generate a value for t this big or > bigger. > > Am I close? Yes, all that sounds as though you've got it. Note that running the test more times will tend to reduce the standard deviation a bit as well, so it may well become significant. In this case it's unlikely to affect it much though. > I like to be clear, because it's easy to get confused and take the > above to mean that there's a 90% confidence that there is no actual > significant difference in performance based on that sampling. (Given > Tom's assurance that this version of the patch should have similar > performance to the last, and the samples from the prior patch went the > other direction, I'm convinced there is not a significant difference, > but if I'm going to use the referenced calculations, I want to be > clear how to interpret the results.) All we're saying is that we're less than 90% confident that there's something "significant" going on. All the fiddling with standard deviations and sample sizes is just easiest way (that I know of) that statistics currently gives us of determining this more formally than a hand-wavy "it looks OK to me". Science tells us that humans are liable to say things are OK when they're not, as well as vice versa; statistics gives us a way to work past these limitations in some common and useful situations. -- Sam http://samason.me.uk/
pgsql-hackers by date: