* The cache will return a sampler equivalent to * {@link org.apache.commons.rng.sampling.distribution#PoissonSampler(Unifo= rmRandomProvider, double)}. *
* The cache allows the {@link PoissonSampler} construction cost to be mini= mised * for low size Poisson samples. The cache is advantageous under the follow= ing * conditions: *
* If the sample size to be made with the same sampler is = large * then the construction cost is minimal compared to the sampling time. *
* Performance improvement is dependent on the speed of the * {@link UniformRandomProvider}. A fast provider can obtain a two-fold spe= ed * improvement for a single-use Poisson sampler. *
* The cache is thread safe.
*/
public class PoissonSamplerCache {
/**
* The minimum N covered by the cache where
* {@code N =3D (int)Math.floor(mean)}.
*/
private final int minN;
/**
* The maximum N covered by the cache where
* {@code N =3D (int)Math.floor(mean)}.
*/
private final int maxN;
/** The cache of states between {@link minN} and {@link maxN}. */
private final AtomicReferenceArray
* A value of {@code mean} outside the range of the cache is valid. *=20 * @param rng Generator of uniformly distributed random numbers. * @param mean Mean. * @return A Poisson sampler * @throws IllegalArgumentException if {@code mean <=3D 0}. */ public DiscreteSampler getPoissonSampler(UniformRandomProvider rng, double mean) { // Ensure the same functionality as the PoissonSampler by // using a SmallMeanPoissonSampler under the switch point. if (mean < WrapperPoissonSampler.PIVOT) return new SmallMeanPoissonSampler(rng, mean); // Convert the mean into an integer. final int n =3D (int) Math.floor(mean); if (n > maxN) // Outside the range of the cache. return new LargeMeanPoissonSampler(rng, mean); // Look in the cache for a state that can be reused. // Note: The cache is offset by minN. final int index =3D n - minN; LargeMeanPoissonSamplerState state =3D values.get(index); if (state =3D=3D null) { // Compute and store for reuse state =3D LargeMeanPoissonSamplerState.create(n); // Only set this once as it will be the same. // Allows concurrent threads to compute the state without // excess synchronisation once the state is stored. values.compareAndSet(index, null, state); } // Compute the remaining fraction of the mean final double lambdaFractional =3D mean - n; return new LargeMeanPoissonSampler(rng, state, lambdaFractional); } } {code} I've left out the implementation of the {{LargeMeanPoissonSamplerState}}. S= uffice to say that the {{LargeMeanPoissonSampler}} computes it and it is a = nested class. This maintains all the functionality of the {{LargeMeanPoisso= nSampler}}=C2=A0in one place. =C2=A0 Q. Should I pursue this addition? =C2=A0 =C2=A0 was (Author: alexherbert): I've created a benchmark for a cache for the state of the {{LargeMeanPoisso= nSampler}}. The implementing classes are all package level scope and so the= details are hidden from the user. The API is a slight change from above. First create the cache with an expec= ted range of the mean. Then use the cache to get a {{DiscreteSampler}}: {code:java} double minMean =3D 20; double maxMean =3D 256; PoissonSamplerCache cache =3D new PoissonSamplerCache(minMean, maxMean); UniformRandomProvider rng =3D RandomSource.create(RandomSource.SPLIT_MIX_64= ); for (int i=3D0; i<1000; i++) { double mean =3D minMean + rng.nextDouble() * (maxMean - minMean); DiscreteSampler s1 =3D cache.getPoissonSampler(rng, mean); // This is equivalent to: DiscreteSampler s2 =3D new PoissonSampler(rng, mean); } {code} Basically the cache should provide a sampler such that it matches the const= ructor for the {{PoissonSampler}}, i.e. what would be produced without usin= g a cache. I've tested this with a small and medium range using 10 benchmark iteration= s. I've used a {{AtomicReferenceArray}} for a thread-safe version and a pla= in array for comparison. Here are the relative performance figures: ||Source||Range||Name||Relative Score|| |SPLIT_MIX_64|256|NoCache|1| |SPLIT_MIX_64|256|SyncCache|0.492428363949534| |SPLIT_MIX_64|256|Cache|0.488172881250931| |SPLIT_MIX_64|1024|NoCache|1| |SPLIT_MIX_64|1024|SyncCache|0.588722614889271| |SPLIT_MIX_64|1024|Cache|0.541363439469733| |WELL_19937_C|256|NoCache|1| |WELL_19937_C|256|Cache|0.582896153005741| |WELL_19937_C|256|SyncCache|0.57266842237589| |WELL_19937_C|1024|NoCache|1| |WELL_19937_C|1024|Cache|0.63785286486636| |WELL_19937_C|1024|SyncCache|0.585046716125342| |WELL_44497_B|256|NoCache|1| |WELL_44497_B|256|Cache|0.732779052801947| |WELL_44497_B|256|SyncCache|0.631362596037034| |WELL_44497_B|1024|NoCache|1| |WELL_44497_B|1024|Cache|0.658821332466701| |WELL_44497_B|1024|SyncCache|0.605039844869757| Observations: * The cache is faster; twice as fast for the SPLIT_MIX RNG. * When the RNG is slower (SPLIT_MIX < WELL_19937_C < WELL_44497_B) the spe= ed up is reduced. This is expected given the initialisation is a smaller pa= rt of the algorithm run time. * There is not much difference between the synchronised cache and the plai= n version. I'll need to repeat this with more iterations,=C2=A0different ranges and wi= th a totally idle computer but it shows the cache may be worth adding to th= e library. Since the plain version doesn't always beat the sync version there's not mu= ch point in having a non-synchronised version; this makes it more flexible. Here is the current implementation: {code:java} package org.apache.commons.rng.sampling.distribution; import java.util.concurrent.atomic.AtomicReferenceArray; import org.apache.commons.rng.UniformRandomProvider; import org.apache.commons.rng.sampling.distribution.PoissonSampler; import org.apache.commons.rng.sampling.distribution.LargeMeanPoissonSampler= .LargeMeanPoissonSamplerState; /** * Create a sampler for the * Poisso= n * distribution using a cache to minimise construction cost. *
* The cache will return a sampler equivalent to * {@link org.apache.commons.rng.sampling.distribution#PoissonSampler(Unifo= rmRandomProvider, double)}. *
* The cache allows the {@link PoissonSampler} construction cost to be mini= mised * for low size Poisson samples. The cache is advantageous under the follow= ing * conditions: *
* If the sample size to be made with the same sampler is = large * then the construction cost is minimal compared to the sampling time. *
* Performance improvement is dependent on the speed of the * {@link UniformRandomProvider}. A fast provider can obtain a two-fold spe= ed * improvement for a single-use Poisson sampler. *
* The cache is thread safe.
*/
public class PoissonSamplerCache {
/**
* The minimum N covered by the cache where
* {@code N =3D (int)Math.floor(mean)}.
*/
private final int minN;
/**
* The maximum N covered by the cache where
* {@code N =3D (int)Math.floor(mean)}.
*/
private final int maxN;
/** The cache of states between {@link minN} and {@link maxN}. */
private final AtomicReferenceArray
* A value of {@code mean} outside the range of the cache is valid. *=20 * @param rng Generator of uniformly distributed random numbers. * @param mean Mean. * @return A Poisson sampler * @throws IllegalArgumentException if {@code mean <=3D 0}. */ public DiscreteSampler getPoissonSampler(UniformRandomProvider rng, double mean) { // Ensure the same functionality as the PoissonSampler by // using a SmallMeanPoissonSampler under the switch point. if (mean < WrapperPoissonSampler.PIVOT) return new SmallMeanPoissonSampler(rng, mean); // Convert the mean into an integer. final int n =3D (int) Math.floor(mean); if (n > maxN) // Outside the range of the cache. return new LargeMeanPoissonSampler(rng, mean); // Look in the cache for a state that can be reused. // Note: The cache is offset by minN. final int index =3D n - minN; LargeMeanPoissonSamplerState state =3D values.get(index); if (state =3D=3D null) { // Compute and store for reuse state =3D LargeMeanPoissonSamplerState.create(n); // Only set this once as it will be the same. // Allows concurrent threads to compute the state without // excess synchronisation once the state is stored. values.compareAndSet(index, null, state); } // Compute the remaining fraction of the mean final double lambdaFractional =3D mean - n; return new LargeMeanPoissonSampler(rng, state, lambdaFractional); } } {code} I've left out the implementation of the {{LargeMeanPoissonSamplerState}}. S= uffice to say that the {{LargeMeanPoissonSampler}} computes it and it is a = nested class. This maintains all the functionality of the {{LargeMeanPoisso= nSampler}}=C2=A0in one place. =C2=A0 Q. Should I pursue this addition? =C2=A0 =C2=A0 > PoissonSampler single use speed improvements > -------------------------------------------- > > Key: RNG-50 > URL: https://issues.apache.org/jira/browse/RNG-50 > Project: Commons RNG > Issue Type: Improvement > Affects Versions: 1.0 > Reporter: Alex D Herbert > Priority: Minor > Attachments: PoissonSamplerTest.java, jmh-result.csv > > > The Sampler architecture of {{org.apache.commons.rng.sampling.distributio= n}} is nicely written for fast sampling of small dataset sizes. The constru= ctors for the samplers do not check the input parameters are valid for the = respective distributions (in contrast to the old {{org.apache.commons.math3= .random.distribution}} classes). I assume this is a design choice for speed= . Thus most of the samplers can be used within a loop to sample just one va= lue with very little overhead. > The {{PoissonSampler}} precomputes log factorial numbers upon constructio= n if the mean is above 40. This is done using the {{InternalUtils.Factorial= Log}} class. As of version 1.0 this internal class is currently only used i= n the {{PoissonSampler}}. > The cache size is limited to 2*PIVOT (where PIVOT=3D40). But it creates a= nd precomputes the cache every time a PoissonSampler is constructed if the = mean is above the PIVOT value. > Why not create this once in a static block for the PoissonSampler? > {code:java} > /** {@code log(n!)}. */ > private static final FactorialLog factorialLog; > =20 > static=20 > { > factorialLog =3D FactorialLog.create().withCache((int) (2 * PoissonSa= mpler.PIVOT)); > } > {code} > This will make the construction cost of a new {{PoissonSampler}} negligib= le. If the table is computed dynamically as a static construction method th= en the overhead will be in the first use. Thus the following call will be m= uch faster: > {code:java} > UniformRandomProvider rng =3D ...; > int value =3D new PoissonSampler(rng, 50).sample(); > {code} > I have tested this modification (see attached file) and the results are: > {noformat} > Mean 40 Single construction ( 7330792) vs Loop construction = (24334724) (3.319522.2x faster) > Mean 40 Single construction ( 7330792) vs Loop construction with static = FactorialLog ( 7990656) (1.090013.2x faster) > Mean 50 Single construction ( 6390303) vs Loop construction = (19389026) (3.034132.2x faster) > Mean 50 Single construction ( 6390303) vs Loop construction with static = FactorialLog ( 6146556) (0.961857.2x faster) > Mean 60 Single construction ( 6041165) vs Loop construction = (21337678) (3.532047.2x faster) > Mean 60 Single construction ( 6041165) vs Loop construction with static = FactorialLog ( 5329129) (0.882136.2x faster) > Mean 70 Single construction ( 6064003) vs Loop construction = (23963516) (3.951765.2x faster) > Mean 70 Single construction ( 6064003) vs Loop construction with static = FactorialLog ( 5306081) (0.875013.2x faster) > Mean 80 Single construction ( 6064772) vs Loop construction = (26381365) (4.349935.2x faster) > Mean 80 Single construction ( 6064772) vs Loop construction with static = FactorialLog ( 6341274) (1.045591.2x faster) > {noformat} > Thus the speed improvements would be approximately 3-4 fold for single us= e Poisson sampling. -- This message was sent by Atlassian JIRA (v7.6.3#76005)