Re: gaussian distribution pgbench - Mailing list pgsql-hackers
From | Mitsumasa KONDO |
---|---|
Subject | Re: gaussian distribution pgbench |
Date | |
Msg-id | CADupcHXwhX8ab6jjCVp4up8jJjpPxS8W2xedcqan1nx+yUhT1g@mail.gmail.com Whole thread Raw |
In response to | Re: gaussian distribution pgbench (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: gaussian distribution pgbench
|
List | pgsql-hackers |
2014-07-18 5:13 GMT+09:00 Fabien COELHO <coelho@cri.ensmp.fr>:
decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
10,001 to 20,000 => 2,330 timesThe decile description is quite classic when discussing statistics.However, ISTM that it is not the purpose of pgbench documentation to be a
primer about what is an exponential or gaussian distribution, so the idea
would yet be to have a relatively compact explanation, and that the
interested but clueless reader would document h..self from wikipedia or a
text book or a friend or a math teacher (who could be a friend as well:-).
Well, I think it's a balance. I agree that the pgbench documentation
shouldn't try to substitute for a text book or a math teacher, but I
also think that you shouldn't necessarily need to refer to a text book
or a math teacher in order to figure out how to use pgbench. Saying
"it's complicated, so we don't have to explain it" would be a cop out;
we need to *make* it simple. And if there's no way to do that, then
IMHO we should reject the patch in favor of some future patch that
implements something that will be easy for users to understand.[nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
starting vacuum...end.
transaction type: Exponential distribution TPC-B (sort of)
scaling factor: 1
exponential threshold: 10.00000
decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
highest/lowest percent of the range: 9.5% 0.0%
I don't have a clue what that means. None.
Maybe we could add in front of the decile/percent
"distribution of increasing account key values selected by pgbench:"
I still wouldn't know what that meant. And it misses the point
anyway: if the documentation is good, this will be unnecessary. If
the documentation is bad, a printout that tries to illustrate it by
example is not an acceptable substitute.
Yeah, maybe, I and Fabien-san don't believe that he doesn't know the decile percentage.
However, I think more description about decile is needed.
For example, when we set the number of transaction 10,000 (-t 10000), range of aid is 100,000,
and --exponential is 10, decile percents is under following as you know.
highest/lowest percent of the range: 9.5% 0.0%
They mean that,
#number of access in range of aid (from decile percents):
1 to 10,000 => 6,320 times
20,001 to 30,000 => 860 times
...
90,001 to 10,0000 => 0 times
#number of access in range of aid (from highest/lowest percent of the range):
1 to 1,000 => 950 times
...
99,001 to 10,0000 => 0 times
that's all.
Their information is easy to understand distribution of access probability, isn't it?
Maybe I and Fabien-san have a knowledge of mathematics, so we think decile percentage is common sense.
But if it isn't common sense, I agree with adding about these explanation in the documents.
Best regards,
--
Mitsumasa KONDO
pgsql-hackers by date: