commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [rng] Copying samplers
Date Thu, 09 May 2019 15:00:34 GMT

On 09/05/2019 15:39, Gilles Sadowski wrote:
> Le jeu. 9 mai 2019 à 15:41, Alex Herbert <alex.d.herbert@gmail.com> a écrit :
>> On Sat, 4 May 2019 at 23:52, Alex Herbert <alex.d.herbert@gmail.com> wrote:
>>
>>>
>>>> On 4 May 2019, at 22:34, Gilles Sadowski <gilleseran@gmail.com> wrote:
>>>>
>>>> Hi.
>>>>
>>>> Le sam. 4 mai 2019 à 21:31, Alex Herbert <alex.d.herbert@gmail.com>
a
>>> écrit :
>>>>>
>>>>>
>>>>>> On 4 May 2019, at 14:46, Gilles Sadowski <gilleseran@gmail.com>
wrote:
>>>>>>
>>>>>> Hello.
>>>>>>
>>>>>> Le ven. 3 mai 2019 à 16:57, Alex Herbert <alex.d.herbert@gmail.com
>>> <mailto:alex.d.herbert@gmail.com>> a écrit :
>>>>>>> Most of the samplers in the library have very small states that
are
>>> easy
>>>>>>> to compute. Some have computations that are more expensive, such
as
>>> the
>>>>>>> LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler.
>>>>>>>
>>>>>>> However once the state is computed the only part of the state
that
>>>>>>> changes is the RNG. I would like to suggest a way to copy samplers
as
>>>>>>> something like:
>>>>>>>
>>>>>>> DiscreteSampler newInstance(UniformRandomProvider)
>>>>>>>
>>>>>>> The new instance would share all the private state of the first
>>> sampler
>>>>>>> except the RNG. This can be used for multi-threaded applications
which
>>>>>>> require a new sampler per thread but sample from the same
>>> distribution.
>>>>>>> A particular case in point is the as yet not integrated
>>>>>>> MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which
has a
>>>>>>> "large" state [2] that takes a "long" time [3] to compute but
is
>>>>>>> effectively immutable. This could be shared across instances
saving
>>>>>>> memory for parallel application.
>>>>>>>
>>>>>>> A copy instance would be almost zero set-up time and provide
>>> opportunity
>>>>>>> for caching of commonly used samplers.
>>>>>> The goal is sharing (immutable) state so it seems that the semantics
is
>>>>>> not "copy".
>>>>>>
>>>>>> Isn't it a "factory" that we are after?  E.g. something like:
>>>>>> public final class CachedSamplingFactory {
>>>>>>    private static PoissonSamplerCache poisson = new
>>> PoissonSamplerCache();
>>>>>>    public PoissonSampler createPoissonSampler(UniformRandomProvider
>>>>>> rng, double mean) {
>>>>>>        if (!poisson.isCached(mean)) {
>>>>>>            poisson.createCache(mean); // Initialize (requires
>>>>>> synchronization) ...
>>>>>>        }
>>>>>>        return new PoissonSampler(poisson.getCache(mean), rng); //
>>>>>> Construct using pre-built state.
>>>>>>    }
>>>>>> }
>>>>>> [It may be overkill, more work, and less performant…]
>>>>> But you need a factory for every class you want to share state for. And
>>> the factory actually has to look in a cache. If you operate on an instance
>>> then you get what you want. Another working version of the same sampler. It
>>> would also be thread safe without synchronisation as long as the state is
>>> immutable. The only mutable state is the passed in RNG.
>>>> Agreed.  It was what I meant by the last sentence.
>>>>
>>>>>> IIUC, you suggest to add "newInstance" in the "DiscreatSampler"
>>> interface (?).
>>>>> I did think of extending DiscreteSampler with this functionality. Not
>>> adding to the interface as it currently is ‘functional’ as it has only one
>>> method. I think that should not change. Having thought about it a bit more
>>> I like the idea of a new functional interface. Perhaps:
>>>>> interface DiscreteSamplerProvider {
>>>>>     DiscreteSampler create(UniformRandomProvider rng);
>>>>> }
>>>>>
>>>>> Substitute ‘Provider’ for:
>>>>>
>>>>> - Generator
>>>>> - Supplier (possible clash or alignment with Java 8 depending on the
>>> way it is done)
>>>>> - Factory (though the method is not static so I do not like this)
>>>>> - etc
>>>>>
>>>>> So this then becomes a functional interface that can be used by
>>> anything. However instances of a sampler would be expected to return a
>>> sampler matching their own functionality.
>>>>> Note there are some samplers not implementing an interface that also
>>> could benefit from this. Namely CollectionSampler and
>>> DiscreteProbabilityCollectionSampler. So does this need a generic interface:
>>>>> Sampler<T> {
>>>>>     T sample();
>>>>> }
>>>>>
>>>>> To be complimented with:
>>>>>
>>>>> SamplerProvider<T> {
>>>>>     Sampler<T> create(UniformRandomProvider rng);
>>>>> }
>>>>>
>>>>> So the library would require:
>>>>>
>>>>> SamplerProvider<T>
>>>>> DiscreteSamplerProvider
>>>>> ContinuousSamplerProvider
>>>>>
>>>>> Any sampler can choose to implement being a Provider. There are some
>>> cases where it is mute. For example a ZigguratNormalizedGaussianSampler
>>> just stores the rng in the constructor. However it could still be a
>>> Provider just the method would only call the constructor. It would allow
>>> writing a generic multi-threaded application that just uses e.g. a
>>> DiscreteSamplerProvider to create samplers for each thread. You can then
>>> drop in the actual implementation you require. For example you could swap
>>> the type of PoissonSampler in your simulation by swapping the provider for
>>> the Poisson distribution.
>>>>> How does that sound?
>>>> Fine to have
>>>>   DiscreteSamplerProvider
>>>>   ContinuousSamplerProvider
>>>> [Perhaps the "Supplier" suffix would be better to avoid confusion with
>>>> "UniformRandomProvider".]
>>>>
>>>> At first sight, I don't think that the generic interface would have
>>>> any actual use since, ultimately, the return value of "sample()"
>>>> will be either "int" or "double" (no polymorphism).
>>>>
>>> The generic interface is for the samplers that are typed for collections
>>> and currently return a sample T, or those that return arrays. It would not
>>> be for Integer or Double from the probability distribution samplers. Here
>>> are what could use it:
>>>
>>> CombinationSampler implements Sampler<int[]>
>>> PermutationSampler implements Sampler<int[]>
>>> CollectionSampler implements Sampler<T>
>>> DiscreteProbabilityCollectionSampler implements Sampler<T>
>>>
>>> All are in the package org.apache.commons.rng.sampling.
>>>
>>> Each could also implement SamplerSupplier<T>.
>>>
>>> The set-up cost for the CombinationSampler/PermutationSampler would not be
>>> much different from the constructor and no state can be shared. No real
>>> benefit here other than convenience. But the two CollectionSamplers could
>>> shared the final collection that is created and stored from the constructor
>>> input data. For the case of a large discrete probability collection sampler
>>> this could be a noticeable memory footprint as it also stores the
>>> cumulative distribution table. This would also save on the construction
>>> cost by not having to recompute it.
>>>
>>> Alex
>>>
>> Any further thoughts on this? I think that Supplier is perhaps the wrong
>> term. A Java 8 Supplier has a get() functional method with no parameters.
>> These interfaces would require a UniformRandomProvider as the argument.
>> However the Java 8 Function<T, R> apply method which is applicable here is
>> is a poorer name. So:
>>
>> DiscreteSampler
>> ContinuousSampler
>> Sampler<T>
>>
>> and trying a few options out:
>>
>> DiscreteSamplerFactory createDiscreteSampler(UniformRandomProvider)
>> ContinuousSamplerFactory createContinuousSampler(UniformRandomProvider)
>> SamplerFactory<T> createSampler(UniformRandomProvider)
>>
>> vs.
>>
>> DiscreteSamplerFactory newDiscreteSampler(UniformRandomProvider)
>> ContinuousSamplerFactory newContinuousSampler(UniformRandomProvider)
>> SamplerFactory<T> newSampler(UniformRandomProvider)
>>
>> vs.
>>
>> DiscreteSamplerSupplier getDiscreteSampler(UniformRandomProvider)
>> ContinuousSamplerSupplier getContinuousSampler(UniformRandomProvider)
>> SamplerSupplier<T> getSampler(UniformRandomProvider)
>>
>> vs.
>>
>> DiscreteSamplerGenerator newDiscreteSampler(UniformRandomProvider)
>> ContinuousSamplerGenerator newContinuousSampler(UniformRandomProvider)
>> SamplerGenerator<T> newSampler(UniformRandomProvider)
>>
>> The 'create/new' nomenclature does convey that a new instance is expected,
>> so I prefer that over get. I'm undecided on which is the most appropriate
>> noun for the interface name.
> How about making clearer that the purpose is to share state, and
> use the "fluent API":
>
> interface SharedStateSampler<R> {
>      R withUniformRandomProvider(UniformRandomProvider rng);
> }
>
> E.g.
>
> public class CollectionSampler<T>
>      implements SharedStateSampler<CollectionSampler<T>> {
>      // ...
>      public CollectionSampler<T>
> withUniformRandomProvider(UniformRandomProvider rng) {
>          return /* new instance that shares the immutable state */;
>      }
> }
>
> Gilles

Well that is much nicer. I am fine with that.

I note that this idea can be applied to any sampler even with a very 
small state. Should we aim for that or only pick the low hanging fruit 
of those samplers that have a relatively large construction cost or 
internal state?

I would favour doing it for all samplers that have a state just to be 
consistent. It just needs a bit more work to put into the library.


>
>>>>>
>>>>>
>>>>>> I'm a bit wary that this would compound two different functionalities:
>>>>>> * data generator (method "sample"),
>>>>>> * generator generator (method "newInstance").
>>>>>> [But I currently don't have an example where this would be a problem.]
>>>>>>
>>>>>> Regards,
>>>>>> Gilles
>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>> [1] https://issues.apache.org/jira/browse/RNG-91 <
>>> https://issues.apache.org/jira/browse/RNG-91>
>>>>>>> [2] kB, or possibly MB, of tabulated data
>>>>>>>
>>>>>>> [3] Set-up cost for a Poisson sampler is in the order of 30 to
165
>>> times
>>>>>>> as long as a SmallMeanPoissonSampler for a mean of 2 and 32.
Note
>>>>>>> however that construction still takes only 1.1 and 4.5 microseconds
>>> for
>>>>>>> the "long" time.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message