Re: rand48 replacement - Mailing list pgsql-hackers
From | Yura Sokolov |
---|---|
Subject | Re: rand48 replacement |
Date | |
Msg-id | 8904daa4a916b4990b11d7140e0ff81f@postgrespro.ru Whole thread Raw |
In response to | Re: rand48 replacement (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: rand48 replacement
Re: rand48 replacement |
List | pgsql-hackers |
Fabien COELHO писал 2021-07-06 23:49: > Hello Yura, > >>> However, I'm not enthousiastic at combining two methods depending on >>> the range, the function looks complex enough without that, so I would >>> suggest not to take this option. Also, the decision process adds to >>> the average cost, which is undesirable. >> >> Given 99.99% cases will be in the likely case, branch predictor should >> eliminate decision cost. > > Hmmm. ISTM that a branch predictor should predict that unknown < small > should probably be false, so a hint should be given that it is really > true. Why? Branch predictor is history based: if it were usually true here then it will be true this time either. unknown < small is usually true therefore branch predictor will assume it is true. I put `likely` for compiler: compiler then puts `likely` path closer. > >> And as Dean Rasheed correctly mentioned, mask method will have really >> bad pattern for branch predictor if range is not just below or equal >> to power of two. > > On average the bitmask is the better unbiased method, if the online > figures are to be trusted. Also, as already said, I do not really want > to add code complexity, especially to get lower average performance, > and especially with code like "threshold = - range % range", where > both variables are unsigned, I have a headache just looking at it:-) If you mention https://www.pcg-random.org/posts/bounded-rands.html then 1. first graphs are made with not exact Lemire's code. Last code from https://lemire.me/blog/2016/06/30/fast-random-shuffling/ (which I derived from) performs modulo operation only if `(leftover < range)`. Even with `rand_range(1000000)` it is just once in four thousands runs. 2. You can see "Small-Constant Benchmark" at that page, Debiased Int is 1.5 times faster. And even in "Small-Shuffle" benchmark their unoptimized version is on-par with mask method. 3. If you go to "Making Improvements/Faster Threshold-Based Discarding" section, then you'll see code my version is matched with. It is twice faster than masked method in Small-Shuffle benchmark, and just a bit slower in Large-Shuffle. > >> And __builtin_clzl is not free lunch either, it has latency 3-4 cycles >> on modern processor. > > Well, % is not cheap either. > >> Well, probably it could run in parallel with some part of xoroshiro, >> but it depends on how compiler will optimize this function. >> >>> I would certainly select the unbias multiply method if we want a u32 >>> range variant. >> >> There could be two functions. > > Yep, but do we need them? Who is likely to want 32 bits pseudo random > ints in a range? pgbench needs 64 bits. > > So I'm still inclined to just keep the faster-on-average bitmask > method, despite that it may be slower for some ranges. The average > cost for the worst case in PRNG calls is, if I'm not mistaken: > > 1 * 0.5 + 2 * 0.25 + 3 * 0.125 + ... ~ 2 > > which does not look too bad. You doesn't count cost of branch-misprediction. https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array https://lemire.me/blog/2019/10/15/mispredicted-branches-can-multiply-your-running-times/ Therefore calculation should be at least: 1 * 0.5 + 0.5 * (3 + 0.5 * (3 + ...)) = 6 By the way, we have 64bit random. If we use 44bit from it for range <= (1<<20), then bias will be less than 1/(2**24). Could we just ignore it (given it is not crypto strong random)? uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range) { uint64 val = xoroshiro128ss(state); uint64 m; if ((range & (range-1) == 0) /* handle all power 2 cases */ return range != 0 ? val & (range-1) : 0; if (likely(range < (1<<20))) /* * While multiply method is biased, bias will be smaller than 1/(1<<24) for * such small ranges. Lets ignore it. */ return ((val >> 20) * range) >> 44; /* Apple's mask method */ m = mask_u64(range-1); val &= m; while (val >= range) val = xoroshiro128ss(state) & m; return val; } Anyway, excuse me for heating this discussion cause of such non-essential issue. I'll try to control myself and don't proceed it further. regards, Sokolov Yura.
pgsql-hackers by date: