commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles Sadowski <gillese...@gmail.com>
Subject Re: [Rng] Merge outstanding PRs for construction speed baseline
Date Tue, 12 Mar 2019 16:58:35 GMT
[Replying to the ML.]

Le mar. 12 mars 2019 à 13:00, Alex Herbert <alex.d.herbert@gmail.com> a écrit :
>
>
> On 11/03/2019 23:54, Gilles Sadowski wrote:
> > Le ven. 8 mars 2019 à 18:12, Alex Herbert <alex.d.herbert@gmail.com> a écrit
:
> >> Hi,
> >>
> >> I'd like to start on improving the methods in rng-simple to create the
> >> RandomSource.
> >>
> >> There are a few PRs outstanding but I think the discussion on this list
> >> was that the new XoShiRo generators are OK. So I would like to merge the
> >> following PRs:
> >>
> >> RNG-82: Add XorShift1024StarPhi generator
> > See other thread.
> >
> >> RNG-70: Add new XoShiRo generators
> > +1
>
> Ah. Most of these generators have more than one constructor (see below
> on the discussion for the change to the SplitMix64 generator). These are
> utility constructors given the small size of the seed. e.g. for a seed
> of length 4:
>
> public XoShiRo256StarStar(long[] seed)
>
> public XoShiRo256StarStar(long seed0, long seed1, long seed2, long seed3)
>
> I did not see this as an issue. I added unit tests to show that the same
> input data in either the array or the primitives produces the same
> generator sequence. The primitive constructor does not have to do array
> checking and filling when the array is not the correct length. The input
> is directly used to set the state. It is a cleaner constructor, but it
> does not fit the model required by the ProviderBuilder which would like
> an object seed.
>
> I am going to rewrite the ProviderBuilder internals to allow the size of
> the seed to be known and to remove the use of reflection to create a
> source. In this instance the primitive constructor would be better. That
> is a motivating reason for the change to the SplitMix64 too.

I did not look again at the code yet; but if you can work around
the issue (as I recalled it) without impacting the public API, then
fine I guess. :-)

>
> >
> >> Then use master to set a baseline for the construction speed. This can
> >> be used to compare against the latest code to see how much the changes
> >> have improved construction speed. Given the work will have a big effect
> >> on generators with small state it would be good to have the new
> >> generators in the baseline.
> > IIUC, "big effect" only when constructing and throwing quickly very
> > many objects.
> > The doc should make it clear that it's not the "nominal" use-case.
>
> Yes. There is not a document for this at the moment anyway (i.e. the
> results of the construction benchmark). Should a section be added to the
> user guide?

Benchmarks can certainly go in the "Performance" section.

> This can outline that a generator should be picked for its
> intended purpose: output datatype required (int, double, etc) and number
> of samples (period). For short lived generators with low samples then
> construction speed is a factor.

All good as long as the entry point for application developers to
create RNGs is "RandomSource".

>
> This is a feature that will probably most benefit the
> ThreadLocalRandomSource where short-lifetime generators are likely to be
> required in a multi-threaded environment.

???
IIUC, "ThreadLocalRandomSource" will store the longest-living RNG
instances!

>
> However the benchmark shows that once you are generating 1,000,000
> samples then the construction cost of even the TwoCmres is much less
> than the sampling time.
>
> >
> >> I note that the only tags in git are for releases. So the baseline can
> >> be just a commit Id and I'll add it to the benchmark Jira RNG-72 for
> >> reference.

I'm still wondering about applications that require short-lived
generators...
It doesn't seem to ever be a good idea to do that.  There was
an issue caused by that in [Digester], fixed only in the latest
release...

> >>
> >> There are also some other PRs to be discussed:
> >>
> >> RNG-81: NumberFactory to sample all rationals between 0 and 1.
> >>
> >> This one changes to the implementation in Open JDK for float and double
> >> generation. I think this is OK to merge.
> > +1
> > [No promise of reproducibility for "derived" types.]
> OK. I'll put that in.
> >
> >> It won't effect the baseline
> >> but is good to get it into the next release.
> >>
> >>
> >> RNG-78: Added a ThreadLocalRandomSource.
> >>
> >> This works as a proof of concept. But it is all new files and so I'm
> >> happy to leave that until it is decided exactly if and where to put it.
> > Would there be a better place than in package "o.a.c.rng.simple"?
> It was either in simple or examples. But I would like to see it as a
> feature in use so I would leave it in simple

Sure, it's a useful feature to help users avoid bad usage.

> >> RNG-76: Added primitive long constructor to SplitMix64
> >>
> >> This is something that will effect the benchmark so I'll merge it after.
> >> It is a trivial change with a noticeable performance gain by avoiding
> >> auto-boxing of the long seed.
> > IIRC, there is some developer doc that refer to having a single
> > constructor.  But I do not recall the reason.
>
> I did not know. I cannot find that explained in the project docs. I
> looked in site/xdoc/developers.xml and a poor regex search through the rest.

I think I meant this[1]:
The "provider" must specify one way for setting the seed. For a given
seed, the generated sequence must always be the same.

> Maybe the reason was to prevent ambiguous construction, e.g. with a
> non-parameterised constructor.
>
> An exception was made for TwoCmres since it needs more arguments. That
> is understandable.
>
> Could this rule be relaxed if the same data passed to different
> constructors results in the same generated sequence?

I guess so.
IIRC, the limitation was due to using commons code in order to
instantiate the appropriate RNG (core) class.

> E.g. the unit tests
> I added for the new XoShiRo generators test this.

Thanks for all this work,
Gilles

[1] http://commons.apache.org/proper/commons-rng/commons-rng-core/apidocs/org/apache/commons/rng/core/source32/package-summary.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message