[ https://issues.apache.org/jira/browse/RNG-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820230#comment-16820230 ]
Alex D Herbert edited comment on RNG-90 at 4/17/19 4:04 PM:
------------------------------------------------------------
I have noted that the code above:
{code:java}
return (int) ((n * (long) (nextInt() >>> 1)) >> 31);
{code}
Is actually computing this:
{noformat}
floor( n * [0,2^31-1)
----------
2^31 )
floor( n * [0,1) ){noformat}
Here the fraction numerator can be any value between 0 and 2^31 (exclusive). The denominator is 2^31. So the fraction is a uniform deviate in the range {{[0,1)}}.
Noting that:
{noformat}
Integer.MAX_VALUE = 0x7fffffff
unsigned max value = 0xffffffffL;
and
0x7fffffff * 0xffffffff is a positive long 7ffffffe80000001 (2147483646)
{noformat}
allows this method to be extended using unsigned arithmetic to all values of n:
{noformat}
floor( n * [0,2^32-1)
----------
2^32 )
{noformat}
Becomes:
{code:java}
return (int) ((n * (nextInt() & 0xffffffffL)) >>> 32);
{code}
This essentially computes a 64-bit unsigned result and discards the lower 32-bit to leave a uniform deviate in the desired range.
With a bit of work the same method can be extended to long values. This essentially computes a 128-bit unsigned result and discards the lower 64-bits to leave a uniform deviate in the desired range.
I have added these methods to the {{NumberFactory}} and tested their speed against the rejection algorithm as listed in the header for this ticket. The baseline is the result of returning a single call to {{nextInt}}. The median - baseline is thus the extra work that is done to generate a uniform deviate in a given positive range.
||upperBound||randomSourceName||Method||Score||Error||Median||Median-Baseline||
|0|SPLIT_MIX_64|nextIntBaseline|3.58|0.00|3.58|0.00|
|256|SPLIT_MIX_64|nextIntNumberFactory|4.15|0.05|4.15|0.58|
|257|SPLIT_MIX_64|nextIntNumberFactory|4.18|0.24|4.16|0.58|
|1073741825|SPLIT_MIX_64|nextIntNumberFactory|4.26|0.54|4.17|0.60|
|256|SPLIT_MIX_64|nextIntWhileLoop|4.10|0.06|4.09|0.52|
|257|SPLIT_MIX_64|nextIntWhileLoop|7.37|0.21|7.35|3.78|
|1073741825|SPLIT_MIX_64|nextIntWhileLoop|24.39|0.08|24.39|20.82|
|0|SPLIT_MIX_64|nextLongBaseline|3.73|0.02|3.73|0.00|
|256|SPLIT_MIX_64|nextLongNumberFactory|4.80|0.16|4.79|1.05|
|257|SPLIT_MIX_64|nextLongNumberFactory|4.79|0.04|4.79|1.06|
|1073741825|SPLIT_MIX_64|nextLongNumberFactory|4.79|0.01|4.79|1.06|
|4294967296|SPLIT_MIX_64|nextLongNumberFactory|5.06|0.02|5.06|1.33|
|4294967297|SPLIT_MIX_64|nextLongNumberFactory|6.50|0.07|6.49|2.76|
|4503599627370497|SPLIT_MIX_64|nextLongNumberFactory|6.86|3.13|6.49|2.76|
|256|SPLIT_MIX_64|nextLongWhileLoop|4.58|0.25|4.54|0.81|
|257|SPLIT_MIX_64|nextLongWhileLoop|14.15|0.08|14.15|10.41|
|1073741825|SPLIT_MIX_64|nextLongWhileLoop|14.41|0.02|14.41|10.68|
|4294967296|SPLIT_MIX_64|nextLongWhileLoop|4.54|0.03|4.54|0.81|
|4294967297|SPLIT_MIX_64|nextLongWhileLoop|15.39|6.92|14.56|10.83|
|4503599627370497|SPLIT_MIX_64|nextLongWhileLoop|14.65|0.08|14.64|10.91|
|0|WELL_44497_B|nextIntBaseline|11.22|0.05|11.21|0.00|
|256|WELL_44497_B|nextIntNumberFactory|11.81|1.29|11.60|0.39|
|257|WELL_44497_B|nextIntNumberFactory|11.68|0.75|11.60|0.39|
|1073741825|WELL_44497_B|nextIntNumberFactory|11.57|0.03|11.57|0.36|
|256|WELL_44497_B|nextIntWhileLoop|11.67|0.70|11.59|0.38|
|257|WELL_44497_B|nextIntWhileLoop|14.90|0.08|14.89|3.68|
|1073741825|WELL_44497_B|nextIntWhileLoop|37.22|1.20|37.05|25.84|
|0|WELL_44497_B|nextLongBaseline|19.75|0.05|19.75|0.00|
|256|WELL_44497_B|nextLongNumberFactory|20.94|0.43|20.96|1.21|
|257|WELL_44497_B|nextLongNumberFactory|20.81|0.09|20.81|1.06|
|1073741825|WELL_44497_B|nextLongNumberFactory|21.99|9.74|20.84|1.10|
|4294967296|WELL_44497_B|nextLongNumberFactory|21.23|0.07|21.23|1.48|
|4294967297|WELL_44497_B|nextLongNumberFactory|22.29|0.31|22.30|2.56|
|4503599627370497|WELL_44497_B|nextLongNumberFactory|22.44|0.13|22.44|2.70|
|256|WELL_44497_B|nextLongWhileLoop|20.97|0.06|20.97|1.22|
|257|WELL_44497_B|nextLongWhileLoop|33.03|0.11|33.01|13.27|
|1073741825|WELL_44497_B|nextLongWhileLoop|32.84|1.24|33.00|13.26|
|4294967296|WELL_44497_B|nextLongWhileLoop|21.05|0.44|20.99|1.25|
|4294967297|WELL_44497_B|nextLongWhileLoop|31.80|0.57|31.75|12.01|
|4503599627370497|WELL_44497_B|nextLongWhileLoop|28.46|0.11|28.45|8.71|
Notes:
When a power of 2 is used the WhileLoop rejection algorithm is almost as fast as the primitive number generation. As soon as the while loop part of the algorithm is used there is a big slow down. This is due to the algorithm requiring more random samples as it rejects unsuitable ones. The worst case scenario the rejection rate should be 50%.
The new NumberFactory method is uniformly fast. It has the nice property that the source of randomness is used most significant byte first irrespective of whether the range is a power of 2. This should match the existing functionality in {{BaseProvider}}. Note: This is not what occurs in the while loop when using a power of 2:
{code:java}
// Uses the least significant bytes via masking
return nextInt() & nm;
{code}
Thus that algorithm flips between using the most significant to least significant bits depending on the range.
The number factory computes optimised arithmetic than will match that of BigInteger. It detects when the full computation is not required and uses a faster option. This can be seen in the difference between the number 4294967296 (2^32) which is zero in the lower 32-bits and 4294967297 (2^32 + 1) which has upper and lower bits set to non zero. The algorithm is even fast in the worst case scenario for the rejection method using upper bound 4503599627370497 (2 ^ 52 + 1).
The code moves the logic for computing an integer in a range to the {{NumberFactory}}. The {{BaseProvider}} then delegates to this method.
was (Author: alexherbert):
I have noted that the code above:
{code:java}
return (int) ((n * (long) (nextInt() >>> 1)) >> 31);
{code}
Is actually computing this:
{noformat}
floor( n * [0,2^31-1)
----------
2^31 )
floor( n * [0,1) ){noformat}
Here the fraction numerator can be any value between 0 and 2^31 (exclusive). The denominator is 2^31. So the fraction is a uniform deviate in the range {{[0,1)}}.
Noting that:
{noformat}
Integer.MAX_VALUE = 0x7fffffff
and
0x7fffffff * 0x7fffffff is a positive long 3fffffff00000001
{noformat}
allows this method to be extended using unsigned arithmetic to all values of n:
{noformat}
floor( n * [0,2^32-1)
----------
2^32 )
{noformat}
Becomes:
{code:java}
return (int) ((n * (nextInt() & 0xffffffffL)) >>> 32);
{code}
This essentially computes a 64-bit unsigned result and discards the lower 32-bit to leave a uniform deviate in the desired range.
With a bit of work the same method can be extended to long values. This essentially computes a 128-bit unsigned result and discards the lower 64-bits to leave a uniform deviate in the desired range.
I have added these methods to the {{NumberFactory}} and tested their speed against the rejection algorithm as listed in the header for this ticket. The baseline is the result of returning a single call to {{nextInt}}. The median - baseline is thus the extra work that is done to generate a uniform deviate in a given positive range.
||upperBound||randomSourceName||Method||Score||Error||Median||Median-Baseline||
|0|SPLIT_MIX_64|nextIntBaseline|3.58|0.00|3.58|0.00|
|256|SPLIT_MIX_64|nextIntNumberFactory|4.15|0.05|4.15|0.58|
|257|SPLIT_MIX_64|nextIntNumberFactory|4.18|0.24|4.16|0.58|
|1073741825|SPLIT_MIX_64|nextIntNumberFactory|4.26|0.54|4.17|0.60|
|256|SPLIT_MIX_64|nextIntWhileLoop|4.10|0.06|4.09|0.52|
|257|SPLIT_MIX_64|nextIntWhileLoop|7.37|0.21|7.35|3.78|
|1073741825|SPLIT_MIX_64|nextIntWhileLoop|24.39|0.08|24.39|20.82|
|0|SPLIT_MIX_64|nextLongBaseline|3.73|0.02|3.73|0.00|
|256|SPLIT_MIX_64|nextLongNumberFactory|4.80|0.16|4.79|1.05|
|257|SPLIT_MIX_64|nextLongNumberFactory|4.79|0.04|4.79|1.06|
|1073741825|SPLIT_MIX_64|nextLongNumberFactory|4.79|0.01|4.79|1.06|
|4294967296|SPLIT_MIX_64|nextLongNumberFactory|5.06|0.02|5.06|1.33|
|4294967297|SPLIT_MIX_64|nextLongNumberFactory|6.50|0.07|6.49|2.76|
|4503599627370497|SPLIT_MIX_64|nextLongNumberFactory|6.86|3.13|6.49|2.76|
|256|SPLIT_MIX_64|nextLongWhileLoop|4.58|0.25|4.54|0.81|
|257|SPLIT_MIX_64|nextLongWhileLoop|14.15|0.08|14.15|10.41|
|1073741825|SPLIT_MIX_64|nextLongWhileLoop|14.41|0.02|14.41|10.68|
|4294967296|SPLIT_MIX_64|nextLongWhileLoop|4.54|0.03|4.54|0.81|
|4294967297|SPLIT_MIX_64|nextLongWhileLoop|15.39|6.92|14.56|10.83|
|4503599627370497|SPLIT_MIX_64|nextLongWhileLoop|14.65|0.08|14.64|10.91|
|0|WELL_44497_B|nextIntBaseline|11.22|0.05|11.21|0.00|
|256|WELL_44497_B|nextIntNumberFactory|11.81|1.29|11.60|0.39|
|257|WELL_44497_B|nextIntNumberFactory|11.68|0.75|11.60|0.39|
|1073741825|WELL_44497_B|nextIntNumberFactory|11.57|0.03|11.57|0.36|
|256|WELL_44497_B|nextIntWhileLoop|11.67|0.70|11.59|0.38|
|257|WELL_44497_B|nextIntWhileLoop|14.90|0.08|14.89|3.68|
|1073741825|WELL_44497_B|nextIntWhileLoop|37.22|1.20|37.05|25.84|
|0|WELL_44497_B|nextLongBaseline|19.75|0.05|19.75|0.00|
|256|WELL_44497_B|nextLongNumberFactory|20.94|0.43|20.96|1.21|
|257|WELL_44497_B|nextLongNumberFactory|20.81|0.09|20.81|1.06|
|1073741825|WELL_44497_B|nextLongNumberFactory|21.99|9.74|20.84|1.10|
|4294967296|WELL_44497_B|nextLongNumberFactory|21.23|0.07|21.23|1.48|
|4294967297|WELL_44497_B|nextLongNumberFactory|22.29|0.31|22.30|2.56|
|4503599627370497|WELL_44497_B|nextLongNumberFactory|22.44|0.13|22.44|2.70|
|256|WELL_44497_B|nextLongWhileLoop|20.97|0.06|20.97|1.22|
|257|WELL_44497_B|nextLongWhileLoop|33.03|0.11|33.01|13.27|
|1073741825|WELL_44497_B|nextLongWhileLoop|32.84|1.24|33.00|13.26|
|4294967296|WELL_44497_B|nextLongWhileLoop|21.05|0.44|20.99|1.25|
|4294967297|WELL_44497_B|nextLongWhileLoop|31.80|0.57|31.75|12.01|
|4503599627370497|WELL_44497_B|nextLongWhileLoop|28.46|0.11|28.45|8.71|
Notes:
When a power of 2 is used the WhileLoop rejection algorithm is almost as fast as the primitive number generation. As soon as the while loop part of the algorithm is used there is a big slow down. This is due to the algorithm requiring more random samples as it rejects unsuitable ones. The worst case scenario the rejection rate should be 50%.
The new NumberFactory method is uniformly fast. It has the nice property that the source of randomness is used most significant byte first irrespective of whether the range is a power of 2. This should match the existing functionality in {{BaseProvider}}. Note: This is not what occurs in the while loop when using a power of 2:
{code:java}
// Uses the least significant bytes via masking
return nextInt() & nm;
{code}
Thus that algorithm flips between using the most significant to least significant bits depending on the range.
The number factory computes optimised arithmetic than will match that of BigInteger. It detects when the full computation is not required and uses a faster option. This can be seen in the difference between the number 4294967296 (2^32) which is zero in the lower 32-bits and 4294967297 (2^32 + 1) which has upper and lower bits set to non zero. The algorithm is even fast in the worst case scenario for the rejection method using upper bound 4503599627370497 (2 ^ 52 + 1).
The code moves the logic for computing an integer in a range to the {{NumberFactory}}. The {{BaseProvider}} then delegates to this method.
> Improve nextInt(int) and nextLong(long) for powers of 2
> -------------------------------------------------------
>
> Key: RNG-90
> URL: https://issues.apache.org/jira/browse/RNG-90
> Project: Commons RNG
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.3
> Reporter: Alex D Herbert
> Assignee: Alex D Herbert
> Priority: Minor
>
> The code for nextInt(int) checks the range number n is a power of two and if so it computes a fast solution:
> {code:java}
> return (int) ((n * (long) (nextInt() >>> 1)) >> 31);
> {code}
> This scales a 31 bit positive number by a power of 2 (i.e. n) then discards the least significant bits. An alternative result can be achieved using a mask to discard the most significant bits:
> {code:java}
> return nextInt() & (n-1)
> {code}
> This works if n is a power of 2 as (n-1) will be all the bits set below it. Note: This method is employed by ThreadLocalRandom.
> It also makes the method applicable to nextLong(long) since you no longer require the long multiplication arithmetic.
> The mask version is applicable to any generator with a long period in the lower order bits. The current version for any generator with a short period in the lower order bits. The non-masking method is employed by {{java.util.Random}} which is a weak generator.
> The methods are currently in {{BaseProvider}}. I suggest dividing the methods to use protected methods to compute the result:
> {code:java}
> @Override
> public int nextInt(int n) {
> checkStrictlyPositive(n);
> final int nm1 = n - 1;
> if ((n & nm1) == 0) {
> // Range is a power of 2
> return nextIntPowerOfTwo(n, nm1);
> }
> return nextIntNonPowerOfTwo(n, nm1);
> }
> /**
> * Generates an {@code int} value between 0 (inclusive) and the
> * specified value (exclusive).
> *
> * @param n Bound on the random number to be returned. This is a power of 2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code int} value between 0 (inclusive) and {@code n}
> * (exclusive).
> */
> protected int nextIntPowerOfTwo(int n, int nm1) {
> return nextInt() & nm1;
> }
> /**
> * Generates an {@code int} value between 0 (inclusive) and the specified value
> * (exclusive).
> *
> * @param n Bound on the random number to be returned. This is not a power of 2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code int} value between 0 (inclusive) and {@code n} (exclusive).
> */
> protected int nextIntNonPowerOfTwo(int n, int nm1) {
> int bits;
> int val;
> do {
> bits = nextInt() >>> 1;
> val = bits % n;
> } while (bits - val + nm1 < 0);
> return val;
> }
> @Override
> public long nextLong(long n) {
> checkStrictlyPositive(n);
> final long nm1 = n - 1;
> if ((n & nm1) == 0) {
> // Range is a power of 2
> return nextLongPowerOfTwo(n, nm1);
> }
> return nextLongNonPowerOfTwo(n, nm1);
> }
> /**
> * Generates an {@code long} value between 0 (inclusive) and the
> * specified value (exclusive).
> *
> * @param n Bound on the random number to be returned. This is a power of 2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code long} value between 0 (inclusive) and {@code n}
> * (exclusive).
> */
> protected long nextLongPowerOfTwo(long n, long nm1) {
> return nextLong() & nm1;
> }
> /**
> * Generates an {@code long} value between 0 (inclusive) and the specified value
> * (exclusive).
> *
> * @param n Bound on the random number to be returned. This is not a power of 2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code long} value between 0 (inclusive) and {@code n} (exclusive).
> */
> protected long nextLongNonPowerOfTwo(long n, long nm1) {
> long bits;
> long val;
> do {
> bits = nextLong() >>> 1;
> val = bits % n;
> } while (bits - val + nm1 < 0);
> return val;
> }
> {code}
> This will update all providers to use the new method. Then the JDK implementation can be changed to override the default:
> {code:java}
> @Override
> protected int nextIntPowerOfTwo(int n, int nm1) {
> return (int) ((n * (long) (nextInt() >>> 1)) >> 31);
> }
> @Override
> protected long nextLongPowerOfTwo(long n, long nm1) {
> return nextLongNonPowerOfTwo(n, nm1);
> }
> {code}
> I do not know how the use of protected methods will affect performance. An alternative is to inline the entire computation for the new masking method:
> {code:java}
> public int nextInt(int n) {
> checkStrictlyPositive(n);
> final int nm1 = n - 1;
> if ((n & nm1) == 0) {
> // Range is a power of 2
> return nextInt() & nm1;
> }
> int bits;
> int val;
> do {
> bits = nextInt() >>> 1;
> val = bits % n;
> } while (bits - val + nm1 < 0);
> return val;
> }
> {code}
> Then rewrite the entire method in the JDK generator. This will be less flexible if other generators are added than have short periods in the lower order bits.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)