Re: gaussian distribution pgbench -- splits v4 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: gaussian distribution pgbench -- splits v4 |
Date | |
Msg-id | CA+TgmoZLMxTRsK0Hek=aUkWzLrmjxRLMJQXJQhRpa9nbKRA5vA@mail.gmail.com Whole thread Raw |
In response to | Re: gaussian distribution pgbench -- splits v4 (Mitsumasa KONDO <kondo.mitsumasa@gmail.com>) |
Responses |
Re: gaussian distribution pgbench -- splits v4
|
List | pgsql-hackers |
On Wed, Jul 30, 2014 at 9:00 PM, Mitsumasa KONDO <kondo.mitsumasa@gmail.com> wrote: > Hmm... It doesn't have harm for pgbench source code. And, in general, > checking script is useful for avoiding bug. Not if nobody runs it, or if people run it but don't know what the output should look like. I think anyone who knows enough to find bugs by running these scripts probably doesn't need the scripts. > No, patch B is still needed. Please tell me the reason. I don't like > deciding by someones feeling, > and it needs logical reason. Our documentation is better than the past. I > think it can easy to understand decile probability. > This part of the discussion is needed to continue... > >> Would providing these as additional contrib files be more acceptable? >> Something like "tpc-b-gauss.sql"... Otherwise there is no example available >> to show the feature. > > I agree the test script and including command line options. It's not harm, > and it's useful. As to all of this, I simply don't agree that the stuff has enough value to justify including it. Now, of course, that is subjective: one person may think it has enough value, while another person may think that it does not have enough value. So it just comes down to a question of opinion, and we make those judgements of opinion all the time. If we included everything that everyone who works on the code wants included, we'd end up with a bloated mess of stuff that nobody cares about; indeed, we have a significant amount of stuff in the source code that IMHO looks like somebody's debugging leftovers that should have been removed before commit. I don't want to add more unless there is clear and convincing evidence that a significant number of people want it, and that is not the case here. Now, if we get a few reports from people saying, hey, I was doing some benchmarking with pgbench, and I found the new gaussian feature to be really excellent, but it sucked that there was no command-line option for it, we can go back and add one. No problem! But in the meantime, we've added the core of the feature without cluttering up the list of command-line options with things that may or may not prove to be useful. One of the concerns that I have about the proposal of simply slapping a gaussian or exponential modifier onto \setrandom aid 1 :naccounts is that, while it will allow you to make part of the relation hot and another part of the relation cold, you really can't get any more fine-grained than that. If you use exponential, all the hot accounts will be near the beginning of the relation, and if you use gaussian, they'll all be in the middle. I'm not sure exactly will happen after some updating has happened; I'm guessing some of the keys will still be in their original location and others will have been pushed to the end of the relation following relation-extension. But there's no way, with those command line options, to for example have 5 hot spots distributed uniformly through the relation; or even to have the end of the relation rather than the beginning or the middle as the hot spot. You can do those things with the newly-enhanced \setrand *and a custom script* but not with just a command-line option. So that makes me think that people who find these new facilities useful might not get all that much use out of the command-line option anyway; and we can't have a command-line option for every behavior anyone ever wants. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: