From issues-return-73628-apmail-commons-issues-archive=commons.apache.org@commons.apache.org Tue Apr 23 12:53:07 2019
Return-Path:
X-Original-To: apmail-commons-issues-archive@minotaur.apache.org
Delivered-To: apmail-commons-issues-archive@minotaur.apache.org
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
by minotaur.apache.org (Postfix) with SMTP id E6EDA1848B
for ; Tue, 23 Apr 2019 12:53:06 +0000 (UTC)
Received: (qmail 50394 invoked by uid 500); 23 Apr 2019 12:53:02 -0000
Delivered-To: apmail-commons-issues-archive@commons.apache.org
Received: (qmail 50261 invoked by uid 500); 23 Apr 2019 12:53:02 -0000
Mailing-List: contact issues-help@commons.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: issues@commons.apache.org
Delivered-To: mailing list issues@commons.apache.org
Received: (qmail 50139 invoked by uid 99); 23 Apr 2019 12:53:02 -0000
Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139)
by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Apr 2019 12:53:02 +0000
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 1497FE2B01
for ; Tue, 23 Apr 2019 12:53:01 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 6B4E725817
for ; Tue, 23 Apr 2019 12:53:00 +0000 (UTC)
Date: Tue, 23 Apr 2019 12:53:00 +0000 (UTC)
From: "Alex D Herbert (JIRA)"
To: issues@commons.apache.org
Message-ID:
In-Reply-To:
References:
Subject: [jira] [Commented] (RNG-90) Improve nextInt(int) and nextLong(long)
for powers of 2
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394
[ https://issues.apache.org/jira/browse/RNG-90?page=3Dcom.atlassian.jir=
a.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D16824095=
#comment-16824095 ]=20
Alex D Herbert commented on RNG-90:
-----------------------------------
When computing an integer in a range {{n}} from a possible maximum range of=
{{2^b}} the sample can be obtained using:
{code:java}
// with number in the range [0, 2^b)
int sample =3D number % n;
{code}
This works but will be biased if {{n}} does not exactly divide into {{2^b}}=
. When biased the total number of samples possible for each sample value wi=
ll be either {{x}} or {{x+1}} (under or over-represented). The bias is at a=
maximum when the=C2=A0result of {{(2^b - 1) % n}} is close to {{n / 2}}.
Here is an example table for {{b=3D31}} for the maximum bias for ranges of =
{{n}}. The table is constructed as if sampling 2^b samples from a perfect R=
NG that produces all output in the range [0, 2^b) only once. This can be do=
ne with a counter.
* Upper is the maximum limit for {{n}}
* {{n}} is the value of {{n}} with the highest bias
* mean (u) is the mean number of samples {{x}} of each possible sample val=
ue
* var is the variance of the number of samples of each value
* Frequency(floor(u)) is the frequency numbers in the range [0, n) are obs=
erved {{x}} times (under-represented)
* p(floor(u)) is the numerator of the probability of observing a sample th=
at is under-represented (seen {{x}} times)
* Frequency(ceil(u)) is the frequency numbers in the range [0, n) are obse=
rved {{x+1}} times (over-represented)
* p(ceil(u)) is the numerator of the probability of observing a sample tha=
t is over-represented (seen {{x+1}} times)
||Upper||n||mean (u)||var||Frequency(floor(u))||p(floor(u)))||Frequency(cei=
l(u))||p(ceil(u))||
|2^3|5|429496729.600|0.240|2|429496729|3|429496730|
|2^4|15|143165576.533|0.249|7|143165576|8|143165577|
|2^5|17|126322567.529|0.249|8|126322567|9|126322568|
|2^6|51|42107522.510|0.250|25|42107522|26|42107523|
|2^7|85|25264513.506|0.250|42|25264513|43|25264514|
|2^8|255|8421504.502|0.250|127|8421504|128|8421505|
|2^9|257|8355967.502|0.250|128|8355967|129|8355968|
|2^10|771|2785322.501|0.250|385|2785322|386|2785323|
|2^11|1285|1671193.500|0.250|642|1671193|643|1671194|
|2^12|3855|557064.500|0.250|1927|557064|1928|557065|
|2^13|4369|491527.500|0.250|2184|491527|2185|491528|
|2^14|13107|163842.500|0.250|6553|163842|6554|163843|
|2^15|21845|98305.500|0.250|10922|98305|10923|98306|
|2^16|65535|32768.500|0.250|32767|32768|32768|32769|
|2^17|65537|32767.500|0.250|32768|32767|32769|32768|
|2^18|196611|10922.500|0.250|98305|10922|98306|10923|
|2^19|327685|6553.500|0.250|163842|6553|163843|6554|
|2^20|983055|2184.500|0.250|491527|2184|491528|2185|
|2^21|1114129|1927.500|0.250|557064|1927|557065|1928|
|2^22|3342387|642.500|0.250|1671193|642|1671194|643|
|2^23|6700417|320.500|0.250|3350209|320|3350208|321|
|2^24|16711935|128.500|0.250|8355967|128|8355968|129|
|2^25|16843009|127.500|0.250|8421504|127|8421505|128|
|2^26|50529027|42.500|0.250|25264513|42|25264514|43|
|2^27|84215045|25.500|0.250|42107522|25|42107523|26|
|2^28|252645135|8.500|0.250|126322567|8|126322568|9|
|2^29|286331153|7.500|0.250|143165576|7|143165577|8|
|2^30|858993459|2.500|0.250|429496729|2|429496730|3|
|2^31|1431655765|1.500|0.250|715827882|1|715827883|2|
Note: To get the actually probability divide the numerator by 2^b, in this =
table this is 2^31.
E.g.
If computing {{nextInt(5)}} the mean number of samples is 429496729.6, the =
probability of any sample is 429496729.6/(2^31) but some samples have a pro=
bability of 429496729/2^31 and some 429496730/2^31.
If computing {{nextInt(1431655765)}} the mean number of samples is 1.5, the=
probability of any sample is 1.5/(2^31) but some samples have a probabilit=
y of 1/2^31 and some 2/2^31.
For either extremes of the table it is not easy to detect non uniformity in=
the output. The difference in probabilities between under and over samplin=
g is 1/2^32. At any number of samples the frequencies observed appear close=
enough to ideal to pass a Chi-square test of uniformity.
However the output is biased and can be eliminated by ignoring some of the =
samples that are over-represented. This is what the current algorithm does:
{code:java}
// Rejection method
int bits;
int val;
do {
bits =3D rng.nextInt() >>> 1;
val =3D bits % n;
// Ignore any integer in the range [0, 2^31 - 1] where there are not at=
least (n-1) more samples
// before 2^31 - 1 is exceeded.
} while (bits - val + nm1 < 0);
return val;
{code}
The number of values to ignore can be obtained using:
{code:java}
2^b % n
{code}
The rejection probability is:
{code:java}
(2^b % n) / 2^b
{code}
Here is a table of the mean and max value for the rejection probability for=
b=3D31 (the current rejection algorithm) and b=3D32 (for the multiplicatio=
n algorithm using unsigned 32-bit arithmetic):
||=C2=A0||=C2=A0||b=3D31||=C2=A0||b=3D32||=C2=A0||
||Lower||Upper||mean||max||mean||max||
|2^2|2^3|8.1491e-10|1.3970e-09|5.2387e-10|9.3132e-10|
|2^3|2^4|2.3865e-09|5.1223e-09|9.3132e-10|2.0955e-09|
|2^4|2^5|3.8417e-09|1.1176e-08|2.4447e-09|5.1223e-09|
|2^5|2^6|1.0317e-08|2.7474e-08|4.9404e-09|1.3271e-08|
|2^6|2^7|2.0482e-08|5.5879e-08|9.4769e-09|2.7474e-08|
|2^7|2^8|4.3288e-08|1.0803e-07|2.0867e-08|5.5879e-08|
|2^8|2^9|8.6515e-08|2.2305e-07|4.3467e-08|1.0803e-07|
|2^9|2^10|1.6502e-07|4.5914e-07|8.6059e-08|2.2491e-07|
|2^10|2^11|3.4841e-07|9.3132e-07|1.7195e-07|4.5914e-07|
|2^11|2^12|7.1007e-07|1.8668e-06|3.5512e-07|9.3132e-07|
|2^12|2^13|1.4096e-06|3.7560e-06|7.1313e-07|1.8731e-06|
|2^13|2^14|2.8487e-06|7.5437e-06|1.4155e-06|3.7560e-06|
|2^14|2^15|5.6884e-06|1.5139e-05|2.8562e-06|7.5691e-06|
|2^15|2^16|1.1412e-05|3.0339e-05|5.6999e-06|1.5168e-05|
|2^16|2^17|2.2853e-05|6.0761e-05|1.1425e-05|3.0339e-05|
|2^17|2^18|4.5765e-05|0.00012182|2.2888e-05|6.0909e-05|
|2^18|2^19|9.1519e-05|0.00024366|4.5758e-05|0.00012182|
|2^19|2^20|0.00018307|0.00048780|9.1540e-05|0.00024390|
|2^20|2^21|0.00036607|0.00097561|0.00018307|0.00048780|
|2^21|2^22|0.00073185|0.0019493|0.00036607|0.00097561|
|2^22|2^23|0.0014626|0.0038910|0.00073186|0.0019493|
|2^23|2^24|0.0029208|0.0077519|0.0014626|0.0038910|
|2^24|2^25|0.0058238|0.015385|0.0029208|0.0077519|
|2^25|2^26|0.011576|0.030303|0.0058238|0.015385|
|2^26|2^27|0.022868|0.058824|0.011576|0.030303|
|2^27|2^28|0.044604|0.11111|0.022868|0.058824|
|2^28|2^29|0.084756|0.20000|0.044604|0.11111|
|2^29|2^30|0.15278|0.33333|0.084756|0.20000|
|2^30|2^31|0.25000|0.50000|0.15278|0.33333|
This shows that the multiplication algorithm has a lower rejection rate. Th=
us if the algorithm can be converted to reject bad samples then it will be =
faster that the modulus algorithm.
The multiplication algorithm is computing:
{noformat}
n * [0, 2^32 - 1)
-------------
2^32
{noformat}
If the initial product is left as an unsigned 64-bit result then the upper =
32-bits contains the sample value in the range [0, n), i.e. result / 2^32. =
The lower 32-bits contains the remainder (result % 2^32) and provides infor=
mation about the uniformity. Since the remainder is periodically spaced at =
intervals of n the frequency of observed remainders for a given sample valu=
e is either floor(2^32/n) or ceil(2^32/n).
To ensure all samples have a frequency of floor(2^32/n) the bias is solved =
by rejecting any=C2=A0remainder with a value < 2^32 % n, i.e. the level bel=
ow which denotes that there are still floor(2^32/n) more observations of th=
is sample. The algorithm then becomes:
{code:java}
// Pre-compute the fence
final long fence =3D (1L << 32) % n;
// Rejection method using multiply by a fraction:
// n * [0, 2^32 - 1)
// -------------
// 2^32
// The result is mapped back to an integer and will be in the range [0, n)
long result;
do {
// Compute 64-bit unsigned product of n * [0, 2^32 - 1)
result =3D n * (rng.nextInt() & 0xffffffffL);
// Test the sample uniformity.
} while ((result & 0xffffffffL) < fence);
// Return upper 32-bits as the sample
return (int)(result >>> 32);
{code}
This method requires that the rejection limit is pre-computed. This uses th=
e modulus operator. So the method is not suitable for generic use as the mo=
dulus operator is expensive. If a modulus operation must be performed then =
you can simply use the original algorithm which dynamically computes the fe=
nce limit using the sample value.
I have tested some variants of sampling for int values:
||upperBound||randomSourceName||loops||Method||Score||Error||Median||Error/=
Score||
|=C2=A0|SPLIT_MIX_64|100000|nextIntBaseline|351792.080|10430.902|350306.391=
|0.030|
|256|SPLIT_MIX_64|100000|nextInt|381840.096|27238.305|380145.995|0.071|
|256|SPLIT_MIX_64|100000|nextIntModulusBiased|723254.214|137778.513|719225.=
240|0.190|
|256|SPLIT_MIX_64|100000|nextIntModulusUnbiased|621014.972|22100.939|619868=
.216|0.036|
|256|SPLIT_MIX_64|100000|nextIntMultiplyBiased|403213.611|9888.535|402892.8=
73|0.025|
|256|SPLIT_MIX_64|100000|nextIntMultiplyUnbiased|416627.857|12002.756|41614=
9.143|0.029|
|257|SPLIT_MIX_64|100000|nextInt|693622.660|14068.507|693106.709|0.020|
|257|SPLIT_MIX_64|100000|nextIntModulusBiased|597524.519|22212.352|595434.9=
58|0.037|
|257|SPLIT_MIX_64|100000|nextIntModulusUnbiased|617694.766|16568.233|615302=
.060|0.027|
|257|SPLIT_MIX_64|100000|nextIntMultiplyBiased|392721.079|13033.386|394214.=
903|0.033|
|257|SPLIT_MIX_64|100000|nextIntMultiplyUnbiased|438031.729|7125.817|437836=
.523|0.016|
|1073741825|SPLIT_MIX_64|100000|nextInt|2825259.374|60514.553|2829576.910|0=
.021|
|1073741825|SPLIT_MIX_64|100000|nextIntModulusBiased|593974.005|16651.084|5=
92390.823|0.028|
|1073741825|SPLIT_MIX_64|100000|nextIntModulusUnbiased|1891129.233|56058.51=
9|1886215.783|0.030|
|1073741825|SPLIT_MIX_64|100000|nextIntMultiplyBiased|430692.577|9808.978|4=
31846.416|0.023|
|1073741825|SPLIT_MIX_64|100000|nextIntMultiplyUnbiased|874811.760|63506.61=
4|871546.648|0.073|
* {{nextInt}} method is the full generic method that detects powers of 2 a=
nd dynamically computes the rejection threshold
* {{nextIntModulusBiased}} method uses a single call with the modulus oper=
ator
* {{nextIntModulusUnbiased}} method uses repeat samples using a pre-comput=
ed rejection threshold and a single call with the modulus operator
* {{nextIntMultiplyBiased}} method uses a single call with the multiply me=
thod
* {{nextIntMultiplyUnbiased}} method uses repeat samples using a pre-compu=
ted rejection threshold and a single call to extract the upper 32-bit sampl=
e
Note that the power of 2 is best handled by {{nextInt}} using the masking o=
peration to just extract random bits from the int value.
For non power of 2 values of {{n}} the pre-computation of the rejection thr=
eshold increases the speed of the modulus algorithm. However it cannot avoi=
d a single call to modulus to get the final result. However=C2=A0the pre-co=
mputation allows use of the unbiased multiply method and much faster genera=
tion of random numbers. This is due to the faster multiply arithmetic (vers=
es modulus) and also the lower rejection rate due to the use of 2^32 as the=
base rather than 2^31.
Conclusion
* Modify the {{BaseProvider}} to use the masking method (for powers of 2) =
and keep the modulus method which dynamically sets the rejection threshold
* Update the {{DiscreteUniformSampler}} in the sampling module to use the =
multiply method with a precomputed rejection threshold
The {{DiscreteUniformSampler}} already has internal specialisations of the =
algorithm for different ranges. It just requires a specialisation for power=
s of 2 using masking and non-powers of 2 using the unbiased multiply method=
.
Note:
The original timings included results for the multiply method for {{nextLon=
g(long)}}. This algorithm was biased. Although it is possible to update thi=
s to use the rejection method to make it unbiased that would require the sa=
me computation of the rejection threshold with the modulus operator. So per=
formance can only be gained when repeat sampling from the same range. There=
is no current support for this in the library (since the {{DiscreteSampler=
}} interface returns an {{int}}) other than {{UniformRandomProvider.nextLon=
g(long)}}. So I have not done more analysis on this method. Computation of =
the rejection rate requires BigInteger arithmetic so outputting a rejection=
rate table for base 63 and 64 which requires full enumeration of all value=
s of n would be slow.
I will put the performance benchmark and modifications to the base provider=
in a new PR.
Modifications to the {{DiscreteUniformSampler}} will require a new Jira tic=
ket.
=C2=A0
> Improve nextInt(int) and nextLong(long) for powers of 2
> -------------------------------------------------------
>
> Key: RNG-90
> URL: https://issues.apache.org/jira/browse/RNG-90
> Project: Commons RNG
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.3
> Reporter: Alex D Herbert
> Assignee: Alex D Herbert
> Priority: Minor
> Time Spent: 50m
> Remaining Estimate: 0h
>
> The code for nextInt(int) checks the range number n is a power of two and=
if so it computes a fast solution:
> {code:java}
> =C2=A0=C2=A0=C2=A0 return (int) ((n * (long) (nextInt() >>> 1)) >> 31);=
=20
> {code}
> This scales a 31 bit positive number by a power of 2 (i.e. n) then discar=
ds the least significant bits. An alternative result can be achieved using =
a mask to discard the most significant bits:
> {code:java}
> =C2=A0=C2=A0=C2=A0 return nextInt() & (n-1)
> {code}
> This works if n is a power of 2 as (n-1) will be all the bits set below i=
t. Note: This method is employed by ThreadLocalRandom.
> It also makes the method applicable to nextLong(long) since you no longer=
require the long multiplication arithmetic.
> The mask version is applicable to any generator with a long period in the=
lower order bits. The current version for any generator with a short perio=
d in the lower order bits. The non-masking method is employed by {{java.uti=
l.Random}} which is a weak generator.
> The methods are currently in {{BaseProvider}}. I suggest dividing the met=
hods to use protected methods to compute the result:
> {code:java}
> @Override
> public int nextInt(int n) {
> checkStrictlyPositive(n);
> final int nm1 =3D n - 1;
> if ((n & nm1) =3D=3D 0) {
> // Range is a power of 2
> return nextIntPowerOfTwo(n, nm1);
> }
> return nextIntNonPowerOfTwo(n, nm1);
> }
> /**
> * Generates an {@code int} value between 0 (inclusive) and the
> * specified value (exclusive).
> *
> * @param n Bound on the random number to be returned. This is a power of=
2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code int} value between 0 (inclusive) and {@code n}
> * (exclusive).
> */
> protected int nextIntPowerOfTwo(int n, int nm1) {
> return nextInt() & nm1;
> }
> /**
> * Generates an {@code int} value between 0 (inclusive) and the specified=
value
> * (exclusive).
> *
> * @param n Bound on the random number to be returned. This is not a powe=
r of 2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code int} value between 0 (inclusive) and {@code n}=
(exclusive).
> */
> protected int nextIntNonPowerOfTwo(int n, int nm1) {
> int bits;
> int val;
> do {
> bits =3D nextInt() >>> 1;
> val =3D bits % n;
> } while (bits - val + nm1 < 0);
> return val;
> }
> @Override
> public long nextLong(long n) {
> checkStrictlyPositive(n);
> final long nm1 =3D n - 1;
> if ((n & nm1) =3D=3D 0) {
> // Range is a power of 2
> return nextLongPowerOfTwo(n, nm1);
> }
> return nextLongNonPowerOfTwo(n, nm1);
> }
> /**
> * Generates an {@code long} value between 0 (inclusive) and the
> * specified value (exclusive).
> *
> * @param n Bound on the random number to be returned. This is a power of=
2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code long} value between 0 (inclusive) and {@code n=
}
> * (exclusive).
> */
> protected long nextLongPowerOfTwo(long n, long nm1) {
> return nextLong() & nm1;
> }
> /**
> * Generates an {@code long} value between 0 (inclusive) and the specifie=
d value
> * (exclusive).
> *
> * @param n Bound on the random number to be returned. This is not a powe=
r of 2.
> * @param nm1 The bound value minus 1.
> * @return a random {@code long} value between 0 (inclusive) and {@code n=
} (exclusive).
> */
> protected long nextLongNonPowerOfTwo(long n, long nm1) {
> long bits;
> long val;
> do {
> bits =3D nextLong() >>> 1;
> val =3D bits % n;
> } while (bits - val + nm1 < 0);
> return val;
> }
> {code}
> This will update all providers to use the new method. Then the JDK implem=
entation can be changed to override the default:
> {code:java}
> @Override
> protected int nextIntPowerOfTwo(int n, int nm1) {
> return (int) ((n * (long) (nextInt() >>> 1)) >> 31);
> }
> @Override
> protected long nextLongPowerOfTwo(long n, long nm1) {
> return nextLongNonPowerOfTwo(n, nm1);
> }
> {code}
> I do not know how the use of protected methods will affect performance. A=
n alternative is to inline the entire computation for the new masking metho=
d:
> {code:java}
> public int nextInt(int n) {
> checkStrictlyPositive(n);
> final int nm1 =3D n - 1;
> if ((n & nm1) =3D=3D 0) {
> // Range is a power of 2
> return nextInt() & nm1;
> }
> int bits;
> int val;
> do {
> bits =3D nextInt() >>> 1;
> val =3D bits % n;
> } while (bits - val + nm1 < 0);
> return val;
> }
> {code}
> Then rewrite the entire method in the JDK generator. This will be less fl=
exible if other generators are added than have short periods in the lower o=
rder bits.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)