commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [Rng] Merge outstanding PRs for construction speed baseline
Date Tue, 12 Mar 2019 23:12:23 GMT


> On 12 Mar 2019, at 16:58, Gilles Sadowski <gilleseran@gmail.com> wrote:
> 
> [Replying to the ML.]
> 
> Le mar. 12 mars 2019 à 13:00, Alex Herbert <alex.d.herbert@gmail.com <mailto:alex.d.herbert@gmail.com>>
a écrit :
>> 
>> 
>> On 11/03/2019 23:54, Gilles Sadowski wrote:
>>> Le ven. 8 mars 2019 à 18:12, Alex Herbert <alex.d.herbert@gmail.com> a
écrit :
>>>> Hi,
>>>> 
>>>> I'd like to start on improving the methods in rng-simple to create the
>>>> RandomSource.
>>>> 
>>>> There are a few PRs outstanding but I think the discussion on this list
>>>> was that the new XoShiRo generators are OK. So I would like to merge the
>>>> following PRs:
>>>> 
>>>> RNG-82: Add XorShift1024StarPhi generator
>>> See other thread.
>>> 
>>>> RNG-70: Add new XoShiRo generators
>>> +1
>> 
>> Ah. Most of these generators have more than one constructor (see below
>> on the discussion for the change to the SplitMix64 generator). These are
>> utility constructors given the small size of the seed. e.g. for a seed
>> of length 4:
>> 
>> public XoShiRo256StarStar(long[] seed)
>> 
>> public XoShiRo256StarStar(long seed0, long seed1, long seed2, long seed3)
>> 
>> I did not see this as an issue. I added unit tests to show that the same
>> input data in either the array or the primitives produces the same
>> generator sequence. The primitive constructor does not have to do array
>> checking and filling when the array is not the correct length. The input
>> is directly used to set the state. It is a cleaner constructor, but it
>> does not fit the model required by the ProviderBuilder which would like
>> an object seed.
>> 
>> I am going to rewrite the ProviderBuilder internals to allow the size of
>> the seed to be known and to remove the use of reflection to create a
>> source. In this instance the primitive constructor would be better. That
>> is a motivating reason for the change to the SplitMix64 too.
> 
> I did not look again at the code yet; but if you can work around
> the issue (as I recalled it) without impacting the public API, then
> fine I guess. :-)
> 
>> 
>>> 
>>>> Then use master to set a baseline for the construction speed. This can
>>>> be used to compare against the latest code to see how much the changes
>>>> have improved construction speed. Given the work will have a big effect
>>>> on generators with small state it would be good to have the new
>>>> generators in the baseline.
>>> IIUC, "big effect" only when constructing and throwing quickly very
>>> many objects.
>>> The doc should make it clear that it's not the "nominal" use-case.
>> 
>> Yes. There is not a document for this at the moment anyway (i.e. the
>> results of the construction benchmark). Should a section be added to the
>> user guide?
> 
> Benchmarks can certainly go in the "Performance" section.
> 
>> This can outline that a generator should be picked for its
>> intended purpose: output datatype required (int, double, etc) and number
>> of samples (period). For short lived generators with low samples then
>> construction speed is a factor.
> 
> All good as long as the entry point for application developers to
> create RNGs is "RandomSource".
> 
>> 
>> This is a feature that will probably most benefit the
>> ThreadLocalRandomSource where short-lifetime generators are likely to be
>> required in a multi-threaded environment.
> 
> ???
> IIUC, "ThreadLocalRandomSource" will store the longest-living RNG
> instances!

But only per thread. A new thread will get a new RNG which has to be seeded again. But if
you have an application with a few long term threads then the ThreadLocalRandomSource can
be used to store and retrieve any RNG of choice without having to write extra code to do that
yourself.

> 
>> 
>> However the benchmark shows that once you are generating 1,000,000
>> samples then the construction cost of even the TwoCmres is much less
>> than the sampling time.
>> 
>>> 
>>>> I note that the only tags in git are for releases. So the baseline can
>>>> be just a commit Id and I'll add it to the benchmark Jira RNG-72 for
>>>> reference.
> 
> I'm still wondering about applications that require short-lived
> generators...
> It doesn't seem to ever be a good idea to do that.  There was
> an issue caused by that in [Digester], fixed only in the latest
> release…

Short lived generators for example for array shuffling of small arrays that will be done only
once.

ThreadLocalRandomSource solves this by just giving the thread the current one, or a new one
if the thread has never used one before. In the later case a low construction cost is good,
especially if the thread is also short lived. For a shuffle of 100 items then a SplitMix64
will do a good enough job and can be built very fast. 

For simulations then you should be using longer period generators. If doing a multithreaded
simulation I use a new RNG per Runnable put into an ExecutorService. However the Runnable
can now use ThreadLocalRandomSource.current(…) to get the RNG. So for a fixed thread pool
there potentially can be less generators as they get reused, and the Runnable does not have
to be written to do lots of work in a loop with the same RNG.

Parallel simulations is a case where the split() and jump() functionality would be better
still to avoid sequence overlap.

> 
>>>> 
>>>> There are also some other PRs to be discussed:
>>>> 
>>>> RNG-81: NumberFactory to sample all rationals between 0 and 1.
>>>> 
>>>> This one changes to the implementation in Open JDK for float and double
>>>> generation. I think this is OK to merge.
>>> +1
>>> [No promise of reproducibility for "derived" types.]
>> OK. I'll put that in.
>>> 
>>>> It won't effect the baseline
>>>> but is good to get it into the next release.
>>>> 
>>>> 
>>>> RNG-78: Added a ThreadLocalRandomSource.
>>>> 
>>>> This works as a proof of concept. But it is all new files and so I'm
>>>> happy to leave that until it is decided exactly if and where to put it.
>>> Would there be a better place than in package "o.a.c.rng.simple"?
>> It was either in simple or examples. But I would like to see it as a
>> feature in use so I would leave it in simple
> 
> Sure, it's a useful feature to help users avoid bad usage.
> 
>>>> RNG-76: Added primitive long constructor to SplitMix64
>>>> 
>>>> This is something that will effect the benchmark so I'll merge it after.
>>>> It is a trivial change with a noticeable performance gain by avoiding
>>>> auto-boxing of the long seed.
>>> IIRC, there is some developer doc that refer to having a single
>>> constructor.  But I do not recall the reason.
>> 
>> I did not know. I cannot find that explained in the project docs. I
>> looked in site/xdoc/developers.xml and a poor regex search through the rest.
> 
> I think I meant this[1]:
> The "provider" must specify one way for setting the seed. For a given
> seed, the generated sequence must always be the same.
> 
>> Maybe the reason was to prevent ambiguous construction, e.g. with a
>> non-parameterised constructor.
>> 
>> An exception was made for TwoCmres since it needs more arguments. That
>> is understandable.
>> 
>> Could this rule be relaxed if the same data passed to different
>> constructors results in the same generated sequence?
> 
> I guess so.
> IIRC, the limitation was due to using commons code in order to
> instantiate the appropriate RNG (core) class.
> 
>> E.g. the unit tests
>> I added for the new XoShiRo generators test this.
> 
> Thanks for all this work,
> Gilles

I’m happy to do it. I am just adding to the library from the perspective I have as end user.
Hopefully the changes will benefit other users as well as me.

> 
> [1] http://commons.apache.org/proper/commons-rng/commons-rng-core/apidocs/org/apache/commons/rng/core/source32/package-summary.html
<http://commons.apache.org/proper/commons-rng/commons-rng-core/apidocs/org/apache/commons/rng/core/source32/package-summary.html>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org <mailto:dev-unsubscribe@commons.apache.org>
> For additional commands, e-mail: dev-help@commons.apache.org <mailto:dev-help@commons.apache.org>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message