commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [rng] Split and Jump functions
Date Wed, 01 May 2019 19:57:39 GMT


> On 1 May 2019, at 16:05, Gilles Sadowski <gilleseran@gmail.com> wrote:
> 
> Hi.
> 
> Le mar. 30 avr. 2019 à 17:08, Alex Herbert <alex.d.herbert@gmail.com <mailto:alex.d.herbert@gmail.com>>
a écrit :
>> 
>> On 29/04/2019 22:14, Gilles Sadowski wrote:
>>> Hello.
>>> 
>>> Le lun. 29 avr. 2019 à 19:09, Alex Herbert <alex.d.herbert@gmail.com>
a écrit :
>>>> On 28/04/2019 19:11, Gilles Sadowski wrote:
>>>>> Le dim. 28 avr. 2019 à 17:02, Alex Herbert <alex.d.herbert@gmail.com>
a écrit :
>>>>>> 
>>>>>>> On 28 Apr 2019, at 00:59, Bernd Eckenfels <ecki@zusammenkunft.net>
wrote:
>>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> Just a question, I am unclear on the terminology, is „jump“
(did I miss the discussion leading toot?) something invented here? It sounds to me like this
is a generator where the state can be cloned and it is „seekable“. It probably makes sense
to have those two dimensions separated anyway.
>>>>>> Hi Bernd, thanks for the input.
>>>>>> 
>>>>>> This thread started with the definition:
>>>>>> Jump:
>>>>>> 
>>>>>> To create a new instance of the generator that is deterministically
based on the state of the current instance but advanced a set number of iterations.
>>>>>> 
>>>>>> 
>>>>>> However it is not required to create a new instance at the same time
as jumping. You are correct in that this is two functionalities:
>>>>>> 
>>>>>> 1. Jump forward in the sequence
>>>>>> 2. Copy
>>>>>> 
>>>>>> However the two are coupled. Having jump on its own is useless (why
move forward in the sequence without using it?). So a copy needs to be created somewhere before/after
the jump.
>>>>>> 
>>>>>> The idea of a jump is to create a series of the generator at different
points in the state. The generators can be used for parallel computations and will be ensured
to not overlap in their output sequence for number of outputs skipped by the jump length.
>>>>>> 
>>>>>> FYI. The generators that support this have jump sizes of 2^64, 96,
128, 192, 256 and 512. So this is a lot of output sequence to jump.
>>>>>> 
>>>>>> Copy on its own works but for what purpose? If you want a second
generator at the moment you just create a new one (with a different seed). Duplicate copies
of generators is prone to potential pitfalls where simulations are not as random as you intend.
For a special use case where you wish to run multiple simulations with the same generator
you can use the Restorable interface to save the state of one and re-create it in other instances.
>>>>>> 
>>>>>> The current thread came to the choice of:
>>>>>> 
>>>>>>>>> So the options are (in all cases returning the copy):
>>>>>>>>> 
>>>>>>>>> 1. createAndJumpCopy
>>>>>>>>> 2. copyAndJumpParent
>>>>>>>>> 3. jumpParentAndCopy
>>>>>>>>> 4. jump and copy separately
>>>>>> Jump and copy separately was ruled out to discourage misuse of copy.
>>>>>> 
>>>>>> The current suggestion is 1. Create a copy and jump that ahead. The
current instance is not affected.
>>>>>> 
>>>>>> I now consider this to be weaker for a variety of use cases than
2. This copies the current state for use and then jumps the parent ahead. So this alters the
state of the parent generator.
>>>>>> 
>>>>>> Note that all other methods of a generator alter its state. So having
jump alter its state is reasonable.
>>>>>> 
>>>>>> The most flexible API is to separate jump and copy into two methods.
We can still support helper functions that take in a Jumpable generator and create a jump
series of generators for parallel work. Separating jump and copy allows the functionality
to be used in a larger number of ways than any other interface that attempts to combine jump
and copy.
>>>>>> 
>>>>>> I am fine with having separate jump and copy. If so the copy method,
being part of the Jumpable interface, will be functionally coupled with the jump method and
should be described in Javadoc with the intended purpose to use it to copy the parent state
either before or after a jump into a child generator.
>>>>>> 
>>>>>> As a precursor this API is very flexible:
>>>>>> 
>>>>>> JumpableUniformRandomProvider extends UniformRandomProvider {
>>>>>>      /** Jump and return same instance. */
>>>>>>      JumpableUniformRandomProvider jump();
>>>>>>      /** Copy the instance. */
>>>>>>      JumpableUniformRandomProvider copy();
>>>>>> }
>>>>>> 
>>>>>> Returning the same instance in jump() allows method chaining such
as either:
>>>>>> 
>>>>>> rng.jump().copy();
>>>>>> rng.copy().jump();
>>>>>> 
>>>>>> This potential pitfall is that a user may do this:
>>>>>> 
>>>>>> UniformRandomProvider rng1 = rng.copy().jump();
>>>>>> UniformRandomProvider rng2 = rng.copy().jump();
>>>>>> 
>>>>>> Where rng1 and 2 will be the same, 1 jump ahead of the parent state.
Or:
>>>>>> 
>>>>>> UniformRandomProvider rng1 = rng.jump();
>>>>>> UniformRandomProvider rng2 = rng.jump();
>>>>>> 
>>>>>> Where rng, rng1 and rng2 are the same instance all 2 jumps ahead
of the start point.
>>>>>> 
>>>>>> I think our purpose is to provide an API for the generators that
can jump and that is not too restrictive given the use cases we have so far thought up. There
may be other ideas of use cases that cannot be done with the coupled functionality of:
>>>>>> 
>>>>>> JumpableUniformRandomProvider extends UniformRandomProvider {
>>>>>>      /** Copy the instance, then jump ahead. Return the copy of the
previous state. */
>>>>>>      JumpableUniformRandomProvider jump();
>>>>>> }
>>>>>> 
>>>>>> JumpableUniformRandomProvider extends UniformRandomProvider {
>>>>>>      /** Copy the instance, then jump the copy ahead. Return the
copy. The current instance is not affected. */
>>>>>>      JumpableUniformRandomProvider jump();
>>>>>> }
>>>>>> 
>>>>>> So the split functions without allowing method chaining:
>>>>>> 
>>>>>> JumpableUniformRandomProvider extends UniformRandomProvider {
>>>>>>      /** Jump the current instance ahead. */
>>>>>>      void jump();
>>>>>>      /** Copy the instance. This is intended to be used either before
or after a call to jump()
>>>>>>       * to create a series of generators. */
>>>>>>      JumpableUniformRandomProvider copy();
>>>>>> }
>>>>> As you indicated above, there is no advantage in having separate
>>>>> "jump()" and "copy()",
>>>>> as counter-intuitive it may look at first sight.
>>>> So is the favoured approach now to make the jump method return a copy of
>>>> the current state and then increment the state of the current instance
>>>> with a jump, as in:
>>>> 
>>>> JumpableUniformRandomProvider extends UniformRandomProvider {
>>>>      /** Copy the instance, then jump ahead. Return the copy of the previous
state. */
>>>>      JumpableUniformRandomProvider jump();
>>>> }
>>>> 
>>>> This would appear to solve more use cases than returning a copy that is
>>>> further ahead. For example the dynamic generation of a jump sequence of
>>>> an unknown size:
>>>> 
>>>> List<UniformRandomProvider> rngs = ...;
>>>> JumpableUniformRandomProvider rng;
>>>> for (Data data : Iterable<Data>) {
>>>>      rngs.add(rng.jump());
>>>> }
>>>> // Do something with all the Data and rngs
>>> I confirm agreement. ;-)
>> OK. I'll modify the Jira tickets to match.
>>> 
>>>> I note that if this is the functionality then the jump() method can be
>>>> written to return a UniformRandomProvider:
>>>> 
>>>> JumpableUniformRandomProvider extends UniformRandomProvider {
>>>>      /** Copy the instance, then jump ahead. Return the copy of the previous
state. */
>>>>      UniformRandomProvider jump();
>>>> }
>>>> 
>>>> It would not have to be enforced to remove jump functionality from the
>>>> copy but should make it clear that the returned object is not to be
>>>> jumped again.
>>> Yes, but a cast could work around the artificial limitation of
>>> the return type, while "restrict" prevents misuse.
>> 
>> My point was which of the following do we want the interface to support:
>> 
>> 1. Return JumpableUniformRandomProvider (i.e. a direct copy)
>> 
>> 2. Return UniformRandomProvider (this could be a copy that could be cast
>> to Jumpable)
>> 
>> 3. Really return a UniformRandomProvider by enforcing the removal of the
>> jump function
>> 
>> For option 3 we will have to provide a helper method somewhere to do
>> this. Implementations should then use this to wrap a return copy.
>> 
>> However removal of the jump function by restriction to just
>> UniformRandomProvider will also remove support for
>> RestorableUniformRandomProvider, and any other interface the
>> implementation is providing.
>> 
>> I am thinking that option 2 is OK. It would not restrict the returned
>> copy but subtly suggests that the copy is not to be used for jumping.
> 
> OK. [Assuming that "jumping" main (only?) case is for a user to speed
> up computations, so that it is his responsiblity to not misuse (e.g. "jump"
> again) the returned instance.]

Jumping is for parallel work from a top down hierarchy of known size. Anyone jumping a RNG
should know this and the pitfalls of jumping again (and potentially jumping into the region
of another RNG output).

If you have a top down hierarchy of unknown size then this is where you need split. But as
previously discussed, although simple to implement for many of the providers, it is no better
than just creating a new instance using the RandomSource factory method other than to avoid
synchronisation overhead. When a user requires this functionality then perhaps it can be revisited.


> 
>> We can enforce actual restriction in the helper method to be added to
>> RandomSource to create a jump series of RNGs from a Jumpable one. But
>> even then it should be optional (which you previously suggested) since
>> downstream consumers of the RNG may want Restore (or Jump) functionality.
> 
> No, when getting a "restricted" instance, downstream consumers should
> have no way to override it, especially in cases where they could mess
> other parts of the code.

I meant that the method for creating a series can optionally restrict. I agree that a restricted
RNG is exactly that; no other functionality is allowed or exposed. However the method to create
a series is probably redundant as you point out below.

> 
>> Consider that this could be solved by a smart restriction method that
>> takes an EnumSet of the interfaces that should be exposed. We then
>> support combinations of the various interfaces. So far we have:
>> 
>> UniformRandomProvider (always)
>> 
>> RestorableUniformRandomProvider (2 methods)
>> 
>> JumpableUniformRandomProvider (1 method)
>> 
>> LongJumpableUniformRandomProvider (1 method)
>> 
>> The total number of combinations of the 3 optional interfaces is 2^3 ==
>> 8 (the 3rd row from Pascal's triangle). Although since you can't be
>> LongJumpable and not Jumpable this drops to 6. Anyway it will get out of
>> hand if more interfaces are added. The point being choosing what to
>> restrict to has many options.
>> 
>> My suggestion would be to not enforce restriction in the interface for
>> jumps but make the return value the next level up the hierarchy:
>> 
>> longJump returns JumpableUniformRandomProvider
>> 
>> jump returns UniformRandomProvider
> 
> Yes, that seems interesting from a usage POV.
> 
>> Then in the helper method to create a jump series the returned array can
>> be the same type (UniformRandomProvider or
>> JumpableUniformRandomProvider) and the method has an option to enforce
>> this interface by wrapping.
> 
> But IIUC, the now preferred implementation make the convenience
> method not necessary (?).

It seems so. The need to precompute an array of RNGs using jump is satisfied by using the
master RNG as a supplier of RNGs with the jump method. You can then create on demand a series
of any length using the return value from jump.

It makes it easier to leave out extra methods from RandomSource to create a series. The only
method perhaps desired in RandomSource is:

UniformRandomProvider restrict(JumpableUniformRandomProvider);

Akin to the current UniformRandomProvider unrestorable(RestorableUniformRandomProvider);

So do we do:

UniformRandomProvider restrict(JumpableUniformRandomProvider);
JumpableUniformRandomProvider restrict(LongJumpableUniformRandomProvider);
UniformRandomProvider restrict(RestorableUniformRandomProvider);

Or:

UniformRandomProvider unjumpable(JumpableUniformRandomProvider);
JumpableUniformRandomProvider unlongJumpable(LongJumpableUniformRandomProvider);
UniformRandomProvider unrestorable(RestorableUniformRandomProvider);

The later option only adds two new methods. The first has 3 new methods (deprecating unrestorable
with restrict) but suffers from having to cast instances of multiple interfaces to ensure
the correct restrict is called. So this makes me favour the verbosely named option.

Alex


> 
> Best,
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org <mailto:dev-unsubscribe@commons.apache.org>
> For additional commands, e-mail: dev-help@commons.apache.org <mailto:dev-help@commons.apache.org>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message