commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <>
Subject Re: [math] Improving tests and performance of RandomGenerator implementations
Date Mon, 01 Aug 2011 21:01:08 GMT
On 8/1/11 1:40 PM, Luc Maisonobe wrote:
> Hi Phil,
> Le 01/08/2011 20:39, Phil Steitz a écrit :
>> On 8/1/11 1:31 AM, wrote:
>>> Hi Phil,
>>> ----- Mail original -----
>>>> In my own applications, I noticed what appears to be poor
>>>> performance in the nextInt(int) method of the Mersenne twister,
>>>> which I was using to *improve* speed.  I think that for small
>>>> n, the
>>>> default implementation in BistreamGenerator may be running too
>>>> many
>>>> iterations.
>>> Mersenne twister uses a quite large pool. It creates
>>> pseudo-random bits
>>> by twisting it and creates large bunches at a time (624 words at
>>> a time).
>>> Hence when you ask for large sets, you should have several calls
>>> that
>>> return fast, and one call that takes a longer time to generate
>>> another
>>> large pool.
>>> So good performances are obtained for generating large sets, not
>>> small sets.
>>> Well generators should be faster and are preferred over Mersenne
>>> twister now,
>>> which is now an old generator. Well generators also have large
>>> pools, but they
>>> don't generate bits in large batches in advance, they do
>>> generate a few words
>>> at a time.
>> Yeah, I know.  Both are faster than the JDK, though, even for just
>> 32-bit chunks in my tests at least.
>> One thing I have been thinking about is exposing nextInt[],
>> nextDouble[], nextGaussian[] etc methods that take advantage of the
>> pools.  So you basically generate a large block of bits use this to
>> fill the output arrays.
> Seems a very good idea. Most of the time, people generate only one
> kind of numbers several times, so it really does make sense.

What I am not certain of and maybe you can clarify is beyond
optimizing the chunking, does this really buy you anything in the
common use case where the same generator instance is used repeatedly
to fill an array.  Won't the generator leverage the pool in this
case anyway?  The additional info on how many bits are going to be
required should in theory be useful, but I am not sure how much it
will get you.

>>>> I am still figuring out how the code works, but I
>>>> thought it would be good to run some benchmarks - using Gilles'
>>>> new
>>>> stuff - against the Harmony implementation in java.util.Random of
>>>> this method.  That led me to notice that there are no unit
>>>> tests for
>>>> BitstreamGenerator.  I propose that we add
>>>>   0) RandomGeneratorAbstractTest with an abstract makeGenerator
>>>> method including fixed seed tests for all RandomGenerator methods
>>>>   1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
>>>> implementing makeGenerator with a BitStreamGenerator that uses the
>>>> JDK generator for next(int)
>>>>   2) Make the test classes for Mersenne and Weil generators extend
>>>> RandomGeneratorAbstractTest, moving redundant tests up into the
>>>> base
>>>> class
>>>> Sound reasonable?
>>> +1
>>>> Also, any recollection why we are using a
>>>> different implementation in BitStreamGenerator for next(int) than
>>>> Harmony and the JDK use?
>>> I don't understand what you mean. next(int) is used to generate
>>> the raw
>>> bits and is the heart of each generator. Each generator has its own
>>> implementation. Replacing next(int) by the JDK generation would
>>> imply
>>> dropping completely Mersenne twister and Well generators.
>> I am sorry.  I meant nextInt(int).  It is that code that seems to be
>> slow in BitStreamGenerator and different from the JDK and Harmony.
> Could you point me at some code ? There are many pitfalls in
> netInt(int) if one wants to make sure the generator is uniform,
> which explain the strange implementation, with the mask
> computation and the loop. By the way, even this implementation
> would benefit from your proposed array generation, as the mask
> could be computed only once.

See the JDK javadoc [1], which I assume matches the internal
implementation and the Harmony code [2].  I know the pitfalls you
are referring to, since I played a good bit with my own code to just
translate bits when I needed to optimize performance in one of my
own apps.  I don't quite get the code we have now.  I am a little
baffled by the shifts to create the mask.  I agree that for this
case, you could definitely get a boost if you know n in advance and
are filling an array.

Thanks for looking at this.


> Luc
>> Phil
>>> Mersenne twister and Well should be fast for generating large
>>> sets, but
>>> most importantly they have very good and *proven* properties
>>> (equidistribution
>>> on large dimensions, null correlation, maximal period ...).
>>> These properties
>>> are essential for example in Monte-Carlo simulations with lots
>>> of variables that
>>> must be independent or have controlled correlations.
>>> Luc
>>>> The Harmony impl is almost identical to
>>>> what is documented in the JDK javadoc.
>>>> Phil
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:
>>>> For additional commands, e-mail:
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message