commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <>
Subject Re: [math] Improving tests and performance of RandomGenerator implementations
Date Wed, 03 Aug 2011 17:05:33 GMT
Le 03/08/2011 18:15, Phil Steitz a écrit :
> On 8/3/11 9:02 AM, sebb wrote:
>> On 3 August 2011 09:06, Luc Maisonobe<>  wrote:
>>> Le 03/08/2011 09:38, Luc Maisonobe a écrit :
>>>> Le 01/08/2011 22:40, Luc Maisonobe a écrit :
>>>>> Hi Phil,
>>>>> Le 01/08/2011 20:39, Phil Steitz a écrit :
>>>>>> On 8/1/11 1:31 AM, wrote:
>>>>>>> Hi Phil,
>>>>>>> ----- Mail original -----
>>>>>>>> In my own applications, I noticed what appears to be poor
>>>>>>>> performance in the nextInt(int) method of the Mersenne twister,
>>>>>>>> which I was using to *improve* speed. I think that for small
n, the
>>>>>>>> default implementation in BistreamGenerator may be running
too many
>>>>>>>> iterations.
>>>>>>> Mersenne twister uses a quite large pool. It creates pseudo-random
>>>>>>> by twisting it and creates large bunches at a time (624 words
at a
>>>>>>> time).
>>>>>>> Hence when you ask for large sets, you should have several calls
>>>>>>> return fast, and one call that takes a longer time to generate
>>>>>>> large pool.
>>>>>>> So good performances are obtained for generating large sets,
>>>>>>> small sets.
>>>>>>> Well generators should be faster and are preferred over Mersenne
>>>>>>> twister now,
>>>>>>> which is now an old generator. Well generators also have large
>>>>>>> but they
>>>>>>> don't generate bits in large batches in advance, they do generate
>>>>>>> few words
>>>>>>> at a time.
>>>>>> Yeah, I know. Both are faster than the JDK, though, even for just
>>>>>> 32-bit chunks in my tests at least.
>>>>>> One thing I have been thinking about is exposing nextInt[],
>>>>>> nextDouble[], nextGaussian[] etc methods that take advantage of the
>>>>>> pools. So you basically generate a large block of bits use this to
>>>>>> fill the output arrays.
>>>>> Seems a very good idea. Most of the time, people generate only one kind
>>>>> of numbers several times, so it really does make sense.
>>>>>>>> I am still figuring out how the code works, but I
>>>>>>>> thought it would be good to run some benchmarks - using Gilles'
>>>>>>>> stuff - against the Harmony implementation in java.util.Random
>>>>>>>> this method. That led me to notice that there are no unit
tests for
>>>>>>>> BitstreamGenerator. I propose that we add
>>>>>>>> 0) RandomGeneratorAbstractTest with an abstract makeGenerator
>>>>>>>> method including fixed seed tests for all RandomGenerator
>>>>>>>> 1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
>>>>>>>> implementing makeGenerator with a BitStreamGenerator that
uses the
>>>>>>>> JDK generator for next(int)
>>>>>>>> 2) Make the test classes for Mersenne and Weil generators
>>>>>>>> RandomGeneratorAbstractTest, moving redundant tests up into
the base
>>>>>>>> class
>>>>>>>> Sound reasonable?
>>>>>>> +1
>>>>>>>> Also, any recollection why we are using a
>>>>>>>> different implementation in BitStreamGenerator for next(int)
>>>>>>>> Harmony and the JDK use?
>>>>>>> I don't understand what you mean. next(int) is used to generate
the raw
>>>>>>> bits and is the heart of each generator. Each generator has its
>>>>>>> implementation. Replacing next(int) by the JDK generation would
>>>>>>> dropping completely Mersenne twister and Well generators.
>>>>>> I am sorry. I meant nextInt(int). It is that code that seems to be
>>>>>> slow in BitStreamGenerator and different from the JDK and Harmony.
>>>>> Could you point me at some code ? There are many pitfalls in netInt(int)
>>>>> if one wants to make sure the generator is uniform, which explain the
>>>>> strange implementation, with the mask computation and the loop. By the
>>>>> way, even this implementation would benefit from your proposed array
>>>>> generation, as the mask could be computed only once.
>>>> I have looked at the implementation for JDK and Harmony and am a little
>>>> puzzled.
>>>> The trick for the power of two (i.e. if ((n&  -n) == n)) is not useful
>>>> for the very elaborate generators like Mersenne twister or Well. Both
>>>> are proven to be equidistributed even for the low order bits. They are
>>>> based on linear recurrences but not linear congruences and do not suffer
>>>> from the drawbacks of the latter.
>>>> What puzzles me more is the loop. It is documented as avoiding the
>>>> uneven distributions, but at first glance the modulo operation bothers
>>>> me. As documentation explicitly states it is designed for this, it is
>>>> most probably true, I simply don't understand how yet.
>>>> So our current implementation is slow, then go ahead and change it to
>>>> the one you showed me. I would simply suggest to get rid of the ((n&
>>>> -n) == n) test. I'll try to understand the condition in the while loop
>>>> to understand how it rejects uneven distributions, just out of curiosity
>>>> for myself.
>>> OK, I finally understood the algorithm and how it rejects the largest
>>> incomplete numbers from k*n to (2^31)-1 where k*n is the largest multiple of
>>> n that fits in a positive integer. The trick lies in the addition of (n-1)
>>> which overflows the integer and wraps the result back to negative values. It
>>> is smart.
>>> +1 to use it.
>> Provided that the algorithm is documented ...
> Yeah, I was going to try to decipher it (and the current impl) and
> provide some doc.  One other thing to consider in this decision is
> do we have to worry about encumberance.  The Harmony impl looks very
> similar to what is described in the JDK javadoc.  I wonder if
> SunOracle might have claim to it.
> Where did you get the current impl, Luc?

Unfortunately, I can't remember and I did not document it, my bad. For 
sure, I did not invent it.


> Now that we have test infrastructure and Gilles' new tool, I will
> run some benchmarks to see what, if anything, the Harmony/JDK impl
> buys us.
> Phil
>>> Luc
>>>> Luc
>>>>> Luc
>>>>>> Phil
>>>>>>> Mersenne twister and Well should be fast for generating large
sets, but
>>>>>>> most importantly they have very good and *proven* properties
>>>>>>> (equidistribution
>>>>>>> on large dimensions, null correlation, maximal period ...). These
>>>>>>> properties
>>>>>>> are essential for example in Monte-Carlo simulations with lots
>>>>>>> variables that
>>>>>>> must be independent or have controlled correlations.
>>>>>>> Luc
>>>>>>>> The Harmony impl is almost identical to
>>>>>>>> what is documented in the JDK javadoc.
>>>>>>>> Phil
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail:
>>>>>>>> For additional commands, e-mail:
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:
>>>>>>> For additional commands, e-mail:
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail:
>>>>>> For additional commands, e-mail:
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail:
>>>>> For additional commands, e-mail:
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:
>>>> For additional commands, e-mail:
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message