[ https://issues.apache.org/jira/browse/RNG90?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=16820230#comment16820230
]
Alex D Herbert edited comment on RNG90 at 4/17/19 4:04 PM:

I have noted that the code above:
{code:java}
return (int) ((n * (long) (nextInt() >>> 1)) >> 31);
{code}
Is actually computing this:
{noformat}
floor( n * [0,2^311)

2^31 )
floor( n * [0,1) ){noformat}
Here the fraction numerator can be any value between 0 and 2^31 (exclusive). The denominator
is 2^31. So the fraction is a uniform deviate in the range {{[0,1)}}.
Noting that:
{noformat}
Integer.MAX_VALUE = 0x7fffffff
unsigned max value = 0xffffffffL;
and
0x7fffffff * 0xffffffff is a positive long 7ffffffe80000001 (2147483646)
{noformat}
allows this method to be extended using unsigned arithmetic to all values of n:
{noformat}
floor( n * [0,2^321)

2^32 )
{noformat}
Becomes:
{code:java}
return (int) ((n * (nextInt() & 0xffffffffL)) >>> 32);
{code}
This essentially computes a 64bit unsigned result and discards the lower 32bit to leave
a uniform deviate in the desired range.
With a bit of work the same method can be extended to long values. This essentially computes
a 128bit unsigned result and discards the lower 64bits to leave a uniform deviate in the
desired range.
I have added these methods to the {{NumberFactory}} and tested their speed against the rejection
algorithm as listed in the header for this ticket. The baseline is the result of returning
a single call to {{nextInt}}. The median  baseline is thus the extra work that is done to
generate a uniform deviate in a given positive range.
upperBoundrandomSourceNameMethodScoreErrorMedianMedianBaseline
0SPLIT_MIX_64nextIntBaseline3.580.003.580.00
256SPLIT_MIX_64nextIntNumberFactory4.150.054.150.58
257SPLIT_MIX_64nextIntNumberFactory4.180.244.160.58
1073741825SPLIT_MIX_64nextIntNumberFactory4.260.544.170.60
256SPLIT_MIX_64nextIntWhileLoop4.100.064.090.52
257SPLIT_MIX_64nextIntWhileLoop7.370.217.353.78
1073741825SPLIT_MIX_64nextIntWhileLoop24.390.0824.3920.82
0SPLIT_MIX_64nextLongBaseline3.730.023.730.00
256SPLIT_MIX_64nextLongNumberFactory4.800.164.791.05
257SPLIT_MIX_64nextLongNumberFactory4.790.044.791.06
1073741825SPLIT_MIX_64nextLongNumberFactory4.790.014.791.06
4294967296SPLIT_MIX_64nextLongNumberFactory5.060.025.061.33
4294967297SPLIT_MIX_64nextLongNumberFactory6.500.076.492.76
4503599627370497SPLIT_MIX_64nextLongNumberFactory6.863.136.492.76
256SPLIT_MIX_64nextLongWhileLoop4.580.254.540.81
257SPLIT_MIX_64nextLongWhileLoop14.150.0814.1510.41
1073741825SPLIT_MIX_64nextLongWhileLoop14.410.0214.4110.68
4294967296SPLIT_MIX_64nextLongWhileLoop4.540.034.540.81
4294967297SPLIT_MIX_64nextLongWhileLoop15.396.9214.5610.83
4503599627370497SPLIT_MIX_64nextLongWhileLoop14.650.0814.6410.91
0WELL_44497_BnextIntBaseline11.220.0511.210.00
256WELL_44497_BnextIntNumberFactory11.811.2911.600.39
257WELL_44497_BnextIntNumberFactory11.680.7511.600.39
1073741825WELL_44497_BnextIntNumberFactory11.570.0311.570.36
256WELL_44497_BnextIntWhileLoop11.670.7011.590.38
257WELL_44497_BnextIntWhileLoop14.900.0814.893.68
1073741825WELL_44497_BnextIntWhileLoop37.221.2037.0525.84
0WELL_44497_BnextLongBaseline19.750.0519.750.00
256WELL_44497_BnextLongNumberFactory20.940.4320.961.21
257WELL_44497_BnextLongNumberFactory20.810.0920.811.06
1073741825WELL_44497_BnextLongNumberFactory21.999.7420.841.10
4294967296WELL_44497_BnextLongNumberFactory21.230.0721.231.48
4294967297WELL_44497_BnextLongNumberFactory22.290.3122.302.56
4503599627370497WELL_44497_BnextLongNumberFactory22.440.1322.442.70
256WELL_44497_BnextLongWhileLoop20.970.0620.971.22
257WELL_44497_BnextLongWhileLoop33.030.1133.0113.27
1073741825WELL_44497_BnextLongWhileLoop32.841.2433.0013.26
4294967296WELL_44497_BnextLongWhileLoop21.050.4420.991.25
4294967297WELL_44497_BnextLongWhileLoop31.800.5731.7512.01
4503599627370497WELL_44497_BnextLongWhileLoop28.460.1128.458.71
Notes:
When a power of 2 is used the WhileLoop rejection algorithm is almost as fast as the primitive
number generation. As soon as the while loop part of the algorithm is used there is a big
slow down. This is due to the algorithm requiring more random samples as it rejects unsuitable
ones. The worst case scenario the rejection rate should be 50%.
The new NumberFactory method is uniformly fast. It has the nice property that the source of
randomness is used most significant byte first irrespective of whether the range is a power
of 2. This should match the existing functionality in {{BaseProvider}}. Note: This is not
what occurs in the while loop when using a power of 2:
{code:java}
// Uses the least significant bytes via masking
return nextInt() & nm;
{code}
Thus that algorithm flips between using the most significant to least significant bits depending
on the range.
The number factory computes optimised arithmetic than will match that of BigInteger. It detects
when the full computation is not required and uses a faster option. This can be seen in the
difference between the number 4294967296 (2^32) which is zero in the lower 32bits and 4294967297
(2^32 + 1) which has upper and lower bits set to non zero. The algorithm is even fast in the
worst case scenario for the rejection method using upper bound 4503599627370497 (2 ^ 52 +
1).
The code moves the logic for computing an integer in a range to the {{NumberFactory}}. The
{{BaseProvider}} then delegates to this method.
was (Author: alexherbert):
I have noted that the code above:
{code:java}
return (int) ((n * (long) (nextInt() >>> 1)) >> 31);
{code}
Is actually computing this:
{noformat}
floor( n * [0,2^311)

2^31 )
floor( n * [0,1) ){noformat}
Here the fraction numerator can be any value between 0 and 2^31 (exclusive). The denominator
is 2^31. So the fraction is a uniform deviate in the range {{[0,1)}}.
Noting that:
{noformat}
Integer.MAX_VALUE = 0x7fffffff
and
0x7fffffff * 0x7fffffff is a positive long 3fffffff00000001
{noformat}
allows this method to be extended using unsigned arithmetic to all values of n:
{noformat}
floor( n * [0,2^321)

2^32 )
{noformat}
Becomes:
{code:java}
return (int) ((n * (nextInt() & 0xffffffffL)) >>> 32);
{code}
This essentially computes a 64bit unsigned result and discards the lower 32bit to leave
a uniform deviate in the desired range.
With a bit of work the same method can be extended to long values. This essentially computes
a 128bit unsigned result and discards the lower 64bits to leave a uniform deviate in the
desired range.
I have added these methods to the {{NumberFactory}} and tested their speed against the rejection
algorithm as listed in the header for this ticket. The baseline is the result of returning
a single call to {{nextInt}}. The median  baseline is thus the extra work that is done to
generate a uniform deviate in a given positive range.
upperBoundrandomSourceNameMethodScoreErrorMedianMedianBaseline
0SPLIT_MIX_64nextIntBaseline3.580.003.580.00
256SPLIT_MIX_64nextIntNumberFactory4.150.054.150.58
257SPLIT_MIX_64nextIntNumberFactory4.180.244.160.58
1073741825SPLIT_MIX_64nextIntNumberFactory4.260.544.170.60
256SPLIT_MIX_64nextIntWhileLoop4.100.064.090.52
257SPLIT_MIX_64nextIntWhileLoop7.370.217.353.78
1073741825SPLIT_MIX_64nextIntWhileLoop24.390.0824.3920.82
0SPLIT_MIX_64nextLongBaseline3.730.023.730.00
256SPLIT_MIX_64nextLongNumberFactory4.800.164.791.05
257SPLIT_MIX_64nextLongNumberFactory4.790.044.791.06
1073741825SPLIT_MIX_64nextLongNumberFactory4.790.014.791.06
4294967296SPLIT_MIX_64nextLongNumberFactory5.060.025.061.33
4294967297SPLIT_MIX_64nextLongNumberFactory6.500.076.492.76
4503599627370497SPLIT_MIX_64nextLongNumberFactory6.863.136.492.76
256SPLIT_MIX_64nextLongWhileLoop4.580.254.540.81
257SPLIT_MIX_64nextLongWhileLoop14.150.0814.1510.41
1073741825SPLIT_MIX_64nextLongWhileLoop14.410.0214.4110.68
4294967296SPLIT_MIX_64nextLongWhileLoop4.540.034.540.81
4294967297SPLIT_MIX_64nextLongWhileLoop15.396.9214.5610.83
4503599627370497SPLIT_MIX_64nextLongWhileLoop14.650.0814.6410.91
0WELL_44497_BnextIntBaseline11.220.0511.210.00
256WELL_44497_BnextIntNumberFactory11.811.2911.600.39
257WELL_44497_BnextIntNumberFactory11.680.7511.600.39
1073741825WELL_44497_BnextIntNumberFactory11.570.0311.570.36
256WELL_44497_BnextIntWhileLoop11.670.7011.590.38
257WELL_44497_BnextIntWhileLoop14.900.0814.893.68
1073741825WELL_44497_BnextIntWhileLoop37.221.2037.0525.84
0WELL_44497_BnextLongBaseline19.750.0519.750.00
256WELL_44497_BnextLongNumberFactory20.940.4320.961.21
257WELL_44497_BnextLongNumberFactory20.810.0920.811.06
1073741825WELL_44497_BnextLongNumberFactory21.999.7420.841.10
4294967296WELL_44497_BnextLongNumberFactory21.230.0721.231.48
4294967297WELL_44497_BnextLongNumberFactory22.290.3122.302.56
4503599627370497WELL_44497_BnextLongNumberFactory22.440.1322.442.70
256WELL_44497_BnextLongWhileLoop20.970.0620.971.22
257WELL_44497_BnextLongWhileLoop33.030.1133.0113.27
1073741825WELL_44497_BnextLongWhileLoop32.841.2433.0013.26
4294967296WELL_44497_BnextLongWhileLoop21.050.4420.991.25
4294967297WELL_44497_BnextLongWhileLoop31.800.5731.7512.01
4503599627370497WELL_44497_BnextLongWhileLoop28.460.1128.458.71
Notes:
When a power of 2 is used the WhileLoop rejection algorithm is almost as fast as the primitive
number generation. As soon as the while loop part of the algorithm is used there is a big
slow down. This is due to the algorithm requiring more random samples as it rejects unsuitable
ones. The worst case scenario the rejection rate should be 50%.
The new NumberFactory method is uniformly fast. It has the nice property that the source of
randomness is used most significant byte first irrespective of whether the range is a power
of 2. This should match the existing functionality in {{BaseProvider}}. Note: This is not
what occurs in the while loop when using a power of 2:
{code:java}
// Uses the least significant bytes via masking
return nextInt() & nm;
{code}
Thus that algorithm flips between using the most significant to least significant bits depending
on the range.
The number factory computes optimised arithmetic than will match that of BigInteger. It detects
when the full computation is not required and uses a faster option. This can be seen in the
difference between the number 4294967296 (2^32) which is zero in the lower 32bits and 4294967297
(2^32 + 1) which has upper and lower bits set to non zero. The algorithm is even fast in the
worst case scenario for the rejection method using upper bound 4503599627370497 (2 ^ 52 +
1).
The code moves the logic for computing an integer in a range to the {{NumberFactory}}. The
{{BaseProvider}} then delegates to this method.
> Improve nextInt(int) and nextLong(long) for powers of 2
> 
>
> Key: RNG90
> URL: https://issues.apache.org/jira/browse/RNG90
> Project: Commons RNG
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.3
> Reporter: Alex D Herbert
> Assignee: Alex D Herbert
> Priority: Minor
>
> The code for nextInt(int) checks the range number n is a power of two and if so it computes
a fast solution:
> {code:java}
> return (int) ((n * (long) (nextInt() >>> 1)) >> 31);
> {code}
> This scales a 31 bit positive number by a power of 2 (i.e. n) then discards the least
significant bits. An alternative result can be achieved using a mask to discard the most significant
bits:
> {code:java}
> return nextInt() & (n1)
> {code}
> This works if n is a power of 2 as (n1) will be all the bits set below it. Note: This
method is employed by ThreadLocalRandom.
> It also makes the method applicable to nextLong(long) since you no longer require the
long multiplication arithmetic.
> The mask version is applicable to any generator with a long period in the lower order
bits. The current version for any generator with a short period in the lower order bits. The
nonmasking method is employed by {{java.util.Random}} which is a weak generator.
> The methods are currently in {{BaseProvider}}. I suggest dividing the methods to use
protected methods to compute the result:
> {code:java}
> @Override
> public int nextInt(int n) {
> checkStrictlyPositive(n);
> final int nm1 = n  1;
> if ((n & nm1) == 0) {
> // Range is a power of 2
> return nextIntPowerOfTwo(n, nm1);
> }
> return nextIntNonPowerOfTwo(n, nm1);
> }
> /**
> * Generates an {@code int} value between 0 (inclusive) and the
> * specified value (exclusive).
> *
> * @param n Bound on the random number to be returned. This is a power of 2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code int} value between 0 (inclusive) and {@code n}
> * (exclusive).
> */
> protected int nextIntPowerOfTwo(int n, int nm1) {
> return nextInt() & nm1;
> }
> /**
> * Generates an {@code int} value between 0 (inclusive) and the specified value
> * (exclusive).
> *
> * @param n Bound on the random number to be returned. This is not a power of 2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code int} value between 0 (inclusive) and {@code n} (exclusive).
> */
> protected int nextIntNonPowerOfTwo(int n, int nm1) {
> int bits;
> int val;
> do {
> bits = nextInt() >>> 1;
> val = bits % n;
> } while (bits  val + nm1 < 0);
> return val;
> }
> @Override
> public long nextLong(long n) {
> checkStrictlyPositive(n);
> final long nm1 = n  1;
> if ((n & nm1) == 0) {
> // Range is a power of 2
> return nextLongPowerOfTwo(n, nm1);
> }
> return nextLongNonPowerOfTwo(n, nm1);
> }
> /**
> * Generates an {@code long} value between 0 (inclusive) and the
> * specified value (exclusive).
> *
> * @param n Bound on the random number to be returned. This is a power of 2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code long} value between 0 (inclusive) and {@code n}
> * (exclusive).
> */
> protected long nextLongPowerOfTwo(long n, long nm1) {
> return nextLong() & nm1;
> }
> /**
> * Generates an {@code long} value between 0 (inclusive) and the specified value
> * (exclusive).
> *
> * @param n Bound on the random number to be returned. This is not a power of 2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code long} value between 0 (inclusive) and {@code n} (exclusive).
> */
> protected long nextLongNonPowerOfTwo(long n, long nm1) {
> long bits;
> long val;
> do {
> bits = nextLong() >>> 1;
> val = bits % n;
> } while (bits  val + nm1 < 0);
> return val;
> }
> {code}
> This will update all providers to use the new method. Then the JDK implementation can
be changed to override the default:
> {code:java}
> @Override
> protected int nextIntPowerOfTwo(int n, int nm1) {
> return (int) ((n * (long) (nextInt() >>> 1)) >> 31);
> }
> @Override
> protected long nextLongPowerOfTwo(long n, long nm1) {
> return nextLongNonPowerOfTwo(n, nm1);
> }
> {code}
> I do not know how the use of protected methods will affect performance. An alternative
is to inline the entire computation for the new masking method:
> {code:java}
> public int nextInt(int n) {
> checkStrictlyPositive(n);
> final int nm1 = n  1;
> if ((n & nm1) == 0) {
> // Range is a power of 2
> return nextInt() & nm1;
> }
> int bits;
> int val;
> do {
> bits = nextInt() >>> 1;
> val = bits % n;
> } while (bits  val + nm1 < 0);
> return val;
> }
> {code}
> Then rewrite the entire method in the JDK generator. This will be less flexible if other
generators are added than have short periods in the lower order bits.

This message was sent by Atlassian JIRA
(v7.6.3#76005)
