[ https://issues.apache.org/jira/browse/RNG90?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=16824095#comment16824095
]
Alex D Herbert commented on RNG90:

When computing an integer in a range {{n}} from a possible maximum range of {{2^b}} the sample
can be obtained using:
{code:java}
// with number in the range [0, 2^b)
int sample = number % n;
{code}
This works but will be biased if {{n}} does not exactly divide into {{2^b}}. When biased the
total number of samples possible for each sample value will be either {{x}} or {{x+1}} (under
or overrepresented). The bias is at a maximum when the result of {{(2^b  1) % n}} is close
to {{n / 2}}.
Here is an example table for {{b=31}} for the maximum bias for ranges of {{n}}. The table
is constructed as if sampling 2^b samples from a perfect RNG that produces all output in the
range [0, 2^b) only once. This can be done with a counter.
* Upper is the maximum limit for {{n}}
* {{n}} is the value of {{n}} with the highest bias
* mean (u) is the mean number of samples {{x}} of each possible sample value
* var is the variance of the number of samples of each value
* Frequency(floor(u)) is the frequency numbers in the range [0, n) are observed {{x}} times
(underrepresented)
* p(floor(u)) is the numerator of the probability of observing a sample that is underrepresented
(seen {{x}} times)
* Frequency(ceil(u)) is the frequency numbers in the range [0, n) are observed {{x+1}} times
(overrepresented)
* p(ceil(u)) is the numerator of the probability of observing a sample that is overrepresented
(seen {{x+1}} times)
Uppernmean (u)varFrequency(floor(u))p(floor(u)))Frequency(ceil(u))p(ceil(u))
2^35429496729.6000.24024294967293429496730
2^415143165576.5330.24971431655768143165577
2^517126322567.5290.24981263225679126322568
2^65142107522.5100.25025421075222642107523
2^78525264513.5060.25042252645134325264514
2^82558421504.5020.25012784215041288421505
2^92578355967.5020.25012883559671298355968
2^107712785322.5010.25038527853223862785323
2^1112851671193.5000.25064216711936431671194
2^123855557064.5000.25019275570641928557065
2^134369491527.5000.25021844915272185491528
2^1413107163842.5000.25065531638426554163843
2^152184598305.5000.25010922983051092398306
2^166553532768.5000.25032767327683276832769
2^176553732767.5000.25032768327673276932768
2^1819661110922.5000.25098305109229830610923
2^193276856553.5000.25016384265531638436554
2^209830552184.5000.25049152721844915282185
2^2111141291927.5000.25055706419275570651928
2^223342387642.5000.25016711936421671194643
2^236700417320.5000.25033502093203350208321
2^2416711935128.5000.25083559671288355968129
2^2516843009127.5000.25084215041278421505128
2^265052902742.5000.25025264513422526451443
2^278421504525.5000.25042107522254210752326
2^282526451358.5000.25012632256781263225689
2^292863311537.5000.25014316557671431655778
2^308589934592.5000.25042949672924294967303
2^3114316557651.5000.25071582788217158278832
Note: To get the actually probability divide the numerator by 2^b, in this table this is 2^31.
E.g.
If computing {{nextInt(5)}} the mean number of samples is 429496729.6, the probability of
any sample is 429496729.6/(2^31) but some samples have a probability of 429496729/2^31 and
some 429496730/2^31.
If computing {{nextInt(1431655765)}} the mean number of samples is 1.5, the probability of
any sample is 1.5/(2^31) but some samples have a probability of 1/2^31 and some 2/2^31.
For either extremes of the table it is not easy to detect non uniformity in the output. The
difference in probabilities between under and over sampling is 1/2^32. At any number of samples
the frequencies observed appear close enough to ideal to pass a Chisquare test of uniformity.
However the output is biased and can be eliminated by ignoring some of the samples that are
overrepresented. This is what the current algorithm does:
{code:java}
// Rejection method
int bits;
int val;
do {
bits = rng.nextInt() >>> 1;
val = bits % n;
// Ignore any integer in the range [0, 2^31  1] where there are not at least (n1) more
samples
// before 2^31  1 is exceeded.
} while (bits  val + nm1 < 0);
return val;
{code}
The number of values to ignore can be obtained using:
{code:java}
2^b % n
{code}
The rejection probability is:
{code:java}
(2^b % n) / 2^b
{code}
Here is a table of the mean and max value for the rejection probability for b=31 (the current
rejection algorithm) and b=32 (for the multiplication algorithm using unsigned 32bit arithmetic):
  b=31 b=32 
LowerUppermeanmaxmeanmax
2^22^38.1491e101.3970e095.2387e109.3132e10
2^32^42.3865e095.1223e099.3132e102.0955e09
2^42^53.8417e091.1176e082.4447e095.1223e09
2^52^61.0317e082.7474e084.9404e091.3271e08
2^62^72.0482e085.5879e089.4769e092.7474e08
2^72^84.3288e081.0803e072.0867e085.5879e08
2^82^98.6515e082.2305e074.3467e081.0803e07
2^92^101.6502e074.5914e078.6059e082.2491e07
2^102^113.4841e079.3132e071.7195e074.5914e07
2^112^127.1007e071.8668e063.5512e079.3132e07
2^122^131.4096e063.7560e067.1313e071.8731e06
2^132^142.8487e067.5437e061.4155e063.7560e06
2^142^155.6884e061.5139e052.8562e067.5691e06
2^152^161.1412e053.0339e055.6999e061.5168e05
2^162^172.2853e056.0761e051.1425e053.0339e05
2^172^184.5765e050.000121822.2888e056.0909e05
2^182^199.1519e050.000243664.5758e050.00012182
2^192^200.000183070.000487809.1540e050.00024390
2^202^210.000366070.000975610.000183070.00048780
2^212^220.000731850.00194930.000366070.00097561
2^222^230.00146260.00389100.000731860.0019493
2^232^240.00292080.00775190.00146260.0038910
2^242^250.00582380.0153850.00292080.0077519
2^252^260.0115760.0303030.00582380.015385
2^262^270.0228680.0588240.0115760.030303
2^272^280.0446040.111110.0228680.058824
2^282^290.0847560.200000.0446040.11111
2^292^300.152780.333330.0847560.20000
2^302^310.250000.500000.152780.33333
This shows that the multiplication algorithm has a lower rejection rate. Thus if the algorithm
can be converted to reject bad samples then it will be faster that the modulus algorithm.
The multiplication algorithm is computing:
{noformat}
n * [0, 2^32  1)

2^32
{noformat}
If the initial product is left as an unsigned 64bit result then the upper 32bits contains
the sample value in the range [0, n), i.e. result / 2^32. The lower 32bits contains the remainder
(result % 2^32) and provides information about the uniformity. Since the remainder is periodically
spaced at intervals of n the frequency of observed remainders for a given sample value is
either floor(2^32/n) or ceil(2^32/n).
To ensure all samples have a frequency of floor(2^32/n) the bias is solved by rejecting any remainder
with a value < 2^32 % n, i.e. the level below which denotes that there are still floor(2^32/n)
more observations of this sample. The algorithm then becomes:
{code:java}
// Precompute the fence
final long fence = (1L << 32) % n;
// Rejection method using multiply by a fraction:
// n * [0, 2^32  1)
// 
// 2^32
// The result is mapped back to an integer and will be in the range [0, n)
long result;
do {
// Compute 64bit unsigned product of n * [0, 2^32  1)
result = n * (rng.nextInt() & 0xffffffffL);
// Test the sample uniformity.
} while ((result & 0xffffffffL) < fence);
// Return upper 32bits as the sample
return (int)(result >>> 32);
{code}
This method requires that the rejection limit is precomputed. This uses the modulus operator.
So the method is not suitable for generic use as the modulus operator is expensive. If a modulus
operation must be performed then you can simply use the original algorithm which dynamically
computes the fence limit using the sample value.
I have tested some variants of sampling for int values:
upperBoundrandomSourceNameloopsMethodScoreErrorMedianError/Score
 SPLIT_MIX_64100000nextIntBaseline351792.08010430.902350306.3910.030
256SPLIT_MIX_64100000nextInt381840.09627238.305380145.9950.071
256SPLIT_MIX_64100000nextIntModulusBiased723254.214137778.513719225.2400.190
256SPLIT_MIX_64100000nextIntModulusUnbiased621014.97222100.939619868.2160.036
256SPLIT_MIX_64100000nextIntMultiplyBiased403213.6119888.535402892.8730.025
256SPLIT_MIX_64100000nextIntMultiplyUnbiased416627.85712002.756416149.1430.029
257SPLIT_MIX_64100000nextInt693622.66014068.507693106.7090.020
257SPLIT_MIX_64100000nextIntModulusBiased597524.51922212.352595434.9580.037
257SPLIT_MIX_64100000nextIntModulusUnbiased617694.76616568.233615302.0600.027
257SPLIT_MIX_64100000nextIntMultiplyBiased392721.07913033.386394214.9030.033
257SPLIT_MIX_64100000nextIntMultiplyUnbiased438031.7297125.817437836.5230.016
1073741825SPLIT_MIX_64100000nextInt2825259.37460514.5532829576.9100.021
1073741825SPLIT_MIX_64100000nextIntModulusBiased593974.00516651.084592390.8230.028
1073741825SPLIT_MIX_64100000nextIntModulusUnbiased1891129.23356058.5191886215.7830.030
1073741825SPLIT_MIX_64100000nextIntMultiplyBiased430692.5779808.978431846.4160.023
1073741825SPLIT_MIX_64100000nextIntMultiplyUnbiased874811.76063506.614871546.6480.073
* {{nextInt}} method is the full generic method that detects powers of 2 and dynamically
computes the rejection threshold
* {{nextIntModulusBiased}} method uses a single call with the modulus operator
* {{nextIntModulusUnbiased}} method uses repeat samples using a precomputed rejection threshold
and a single call with the modulus operator
* {{nextIntMultiplyBiased}} method uses a single call with the multiply method
* {{nextIntMultiplyUnbiased}} method uses repeat samples using a precomputed rejection threshold
and a single call to extract the upper 32bit sample
Note that the power of 2 is best handled by {{nextInt}} using the masking operation to just
extract random bits from the int value.
For non power of 2 values of {{n}} the precomputation of the rejection threshold increases
the speed of the modulus algorithm. However it cannot avoid a single call to modulus to get
the final result. However the precomputation allows use of the unbiased multiply method
and much faster generation of random numbers. This is due to the faster multiply arithmetic
(verses modulus) and also the lower rejection rate due to the use of 2^32 as the base rather
than 2^31.
Conclusion
* Modify the {{BaseProvider}} to use the masking method (for powers of 2) and keep the modulus
method which dynamically sets the rejection threshold
* Update the {{DiscreteUniformSampler}} in the sampling module to use the multiply method
with a precomputed rejection threshold
The {{DiscreteUniformSampler}} already has internal specialisations of the algorithm for different
ranges. It just requires a specialisation for powers of 2 using masking and nonpowers of
2 using the unbiased multiply method.
Note:
The original timings included results for the multiply method for {{nextLong(long)}}. This
algorithm was biased. Although it is possible to update this to use the rejection method to
make it unbiased that would require the same computation of the rejection threshold with the
modulus operator. So performance can only be gained when repeat sampling from the same range.
There is no current support for this in the library (since the {{DiscreteSampler}} interface
returns an {{int}}) other than {{UniformRandomProvider.nextLong(long)}}. So I have not done
more analysis on this method. Computation of the rejection rate requires BigInteger arithmetic
so outputting a rejection rate table for base 63 and 64 which requires full enumeration of
all values of n would be slow.
I will put the performance benchmark and modifications to the base provider in a new PR.
Modifications to the {{DiscreteUniformSampler}} will require a new Jira ticket.
> Improve nextInt(int) and nextLong(long) for powers of 2
> 
>
> Key: RNG90
> URL: https://issues.apache.org/jira/browse/RNG90
> Project: Commons RNG
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.3
> Reporter: Alex D Herbert
> Assignee: Alex D Herbert
> Priority: Minor
> Time Spent: 50m
> Remaining Estimate: 0h
>
> The code for nextInt(int) checks the range number n is a power of two and if so it computes
a fast solution:
> {code:java}
> return (int) ((n * (long) (nextInt() >>> 1)) >> 31);
> {code}
> This scales a 31 bit positive number by a power of 2 (i.e. n) then discards the least
significant bits. An alternative result can be achieved using a mask to discard the most significant
bits:
> {code:java}
> return nextInt() & (n1)
> {code}
> This works if n is a power of 2 as (n1) will be all the bits set below it. Note: This
method is employed by ThreadLocalRandom.
> It also makes the method applicable to nextLong(long) since you no longer require the
long multiplication arithmetic.
> The mask version is applicable to any generator with a long period in the lower order
bits. The current version for any generator with a short period in the lower order bits. The
nonmasking method is employed by {{java.util.Random}} which is a weak generator.
> The methods are currently in {{BaseProvider}}. I suggest dividing the methods to use
protected methods to compute the result:
> {code:java}
> @Override
> public int nextInt(int n) {
> checkStrictlyPositive(n);
> final int nm1 = n  1;
> if ((n & nm1) == 0) {
> // Range is a power of 2
> return nextIntPowerOfTwo(n, nm1);
> }
> return nextIntNonPowerOfTwo(n, nm1);
> }
> /**
> * Generates an {@code int} value between 0 (inclusive) and the
> * specified value (exclusive).
> *
> * @param n Bound on the random number to be returned. This is a power of 2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code int} value between 0 (inclusive) and {@code n}
> * (exclusive).
> */
> protected int nextIntPowerOfTwo(int n, int nm1) {
> return nextInt() & nm1;
> }
> /**
> * Generates an {@code int} value between 0 (inclusive) and the specified value
> * (exclusive).
> *
> * @param n Bound on the random number to be returned. This is not a power of 2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code int} value between 0 (inclusive) and {@code n} (exclusive).
> */
> protected int nextIntNonPowerOfTwo(int n, int nm1) {
> int bits;
> int val;
> do {
> bits = nextInt() >>> 1;
> val = bits % n;
> } while (bits  val + nm1 < 0);
> return val;
> }
> @Override
> public long nextLong(long n) {
> checkStrictlyPositive(n);
> final long nm1 = n  1;
> if ((n & nm1) == 0) {
> // Range is a power of 2
> return nextLongPowerOfTwo(n, nm1);
> }
> return nextLongNonPowerOfTwo(n, nm1);
> }
> /**
> * Generates an {@code long} value between 0 (inclusive) and the
> * specified value (exclusive).
> *
> * @param n Bound on the random number to be returned. This is a power of 2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code long} value between 0 (inclusive) and {@code n}
> * (exclusive).
> */
> protected long nextLongPowerOfTwo(long n, long nm1) {
> return nextLong() & nm1;
> }
> /**
> * Generates an {@code long} value between 0 (inclusive) and the specified value
> * (exclusive).
> *
> * @param n Bound on the random number to be returned. This is not a power of 2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code long} value between 0 (inclusive) and {@code n} (exclusive).
> */
> protected long nextLongNonPowerOfTwo(long n, long nm1) {
> long bits;
> long val;
> do {
> bits = nextLong() >>> 1;
> val = bits % n;
> } while (bits  val + nm1 < 0);
> return val;
> }
> {code}
> This will update all providers to use the new method. Then the JDK implementation can
be changed to override the default:
> {code:java}
> @Override
> protected int nextIntPowerOfTwo(int n, int nm1) {
> return (int) ((n * (long) (nextInt() >>> 1)) >> 31);
> }
> @Override
> protected long nextLongPowerOfTwo(long n, long nm1) {
> return nextLongNonPowerOfTwo(n, nm1);
> }
> {code}
> I do not know how the use of protected methods will affect performance. An alternative
is to inline the entire computation for the new masking method:
> {code:java}
> public int nextInt(int n) {
> checkStrictlyPositive(n);
> final int nm1 = n  1;
> if ((n & nm1) == 0) {
> // Range is a power of 2
> return nextInt() & nm1;
> }
> int bits;
> int val;
> do {
> bits = nextInt() >>> 1;
> val = bits % n;
> } while (bits  val + nm1 < 0);
> return val;
> }
> {code}
> Then rewrite the entire method in the JDK generator. This will be less flexible if other
generators are added than have short periods in the lower order bits.

This message was sent by Atlassian JIRA
(v7.6.3#76005)
