commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex D Herbert (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (RNG-50) PoissonSampler single use speed improvements
Date Fri, 03 Aug 2018 13:58:00 GMT

    [ https://issues.apache.org/jira/browse/RNG-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568231#comment-16568231
] 

Alex D Herbert commented on RNG-50:
-----------------------------------

I've created a benchmark for a cache for the state of the {{LargeMeanPoissonSampler}}. The
implementing classes are all package level scope and so the details are hidden from the user.

The API is a slight change from above. First create the cache with an expected range of the
mean. Then use the cache to get a {{DiscreteSampler}}:
{code:java}
double minMean = 20;
double maxMean = 256;
PoissonSamplerCache cache = new PoissonSamplerCache(minMean, maxMean);
UniformRandomProvider rng = RandomSource.create(RandomSource.SPLIT_MIX_64);
for (int i=0; i<1000; i++) {
    double mean = minMean + rng.nextDouble() * (maxMean - minMean);
    DiscreteSampler s1 = cache.getPoissonSampler(rng, mean);
    // This is equivalent to:
    DiscreteSampler s2 = new PoissonSampler(rng, mean);
}
{code}
Basically the cache should provide a sampler such that it matches the constructor for the
{{PoissonSampler}}, i.e. what would be produced without using a cache.

I've tested this with a small and medium range using 10 benchmark iterations. I've used a
{{AtomicReferenceArray}} for a thread-safe version and a plain array for comparison. Here
are the relative performance figures:
||Source||Range||Name||Relative Score||
|SPLIT_MIX_64|256|NoCache|1|
|SPLIT_MIX_64|256|SyncCache|0.492428363949534|
|SPLIT_MIX_64|256|Cache|0.488172881250931|
|SPLIT_MIX_64|1024|NoCache|1|
|SPLIT_MIX_64|1024|SyncCache|0.588722614889271|
|SPLIT_MIX_64|1024|Cache|0.541363439469733|
|WELL_19937_C|256|NoCache|1|
|WELL_19937_C|256|Cache|0.582896153005741|
|WELL_19937_C|256|SyncCache|0.57266842237589|
|WELL_19937_C|1024|NoCache|1|
|WELL_19937_C|1024|Cache|0.63785286486636|
|WELL_19937_C|1024|SyncCache|0.585046716125342|
|WELL_44497_B|256|NoCache|1|
|WELL_44497_B|256|Cache|0.732779052801947|
|WELL_44497_B|256|SyncCache|0.631362596037034|
|WELL_44497_B|1024|NoCache|1|
|WELL_44497_B|1024|Cache|0.658821332466701|
|WELL_44497_B|1024|SyncCache|0.605039844869757|

Observations:
 * The cache is faster; twice as fast for the SPLIT_MIX RNG.
 * When the RNG is slower (SPLIT_MIX < WELL_19937_C < WELL_44497_B) the speed up is
reduced. This is expected given the initialisation is a smaller part of the algorithm run
time.
 * There is not much difference between the synchronised cache and the plain version.

I'll need to repeat this will more iterations, different ranges and with a totally idle computer
but it shows the cache may be worth adding to the library.

Since the plain version doesn't always beat the sync version there's not much point in having
a non-synchronised version; this makes it more flexible.

Here is the current implementation:
{code:java}
package org.apache.commons.rng.sampling.distribution;

import java.util.concurrent.atomic.AtomicReferenceArray;

import org.apache.commons.rng.UniformRandomProvider;
import org.apache.commons.rng.sampling.distribution.PoissonSampler;
import org.apache.commons.rng.sampling.distribution.LargeMeanPoissonSampler.LargeMeanPoissonSamplerState;

/**
 * Create a sampler for the
 * <a href="http://mathworld.wolfram.com/PoissonDistribution.html">Poisson
 * distribution</a> using a cache to minimise construction cost.
 * <p>
 * The cache will return a sampler equivalent to
 * {@link org.apache.commons.rng.sampling.distribution#PoissonSampler(UniformRandomProvider,
double)}.
 * <p>
 * The cache allows the {@link PoissonSampler} construction cost to be minimised
 * for low size Poisson samples. The cache is advantageous under the following
 * conditions:
 * <ul>
 * <li>The mean of the Poisson distribution falls within a known range.</li>
 * <li>The sample size to be made with the <strong>same</strong> sampler
is
 * small.</li>
 * </ul>
 * <p>
 * If the sample size to be made with the <strong>same</strong> sampler is large
 * then the construction cost is minimal compared to the sampling time.
 * <p>
 * Performance improvement is dependent on the speed of the
 * {@link UniformRandomProvider}. A fast provider can obtain a two-fold speed
 * improvement for a single-use Poisson sampler.
 * <p>
 * The cache is thread safe.
 */
public class PoissonSamplerCache {

    /**
     * The minimum N covered by the cache where
     * {@code N = (int)Math.floor(mean)}.
     */
    private final int minN;
    /**
     * The maximum N covered by the cache where
     * {@code N = (int)Math.floor(mean)}.
     */
    private final int maxN;
    /** The cache of states between {@link minN} and {@link maxN}. */
    private final AtomicReferenceArray<LargeMeanPoissonSamplerState> values;

    /**
     * @param minMean The minimum mean covered by the cache.
     * @param maxMean The maximum mean covered by the cache.
     * @throws IllegalArgumentException if {@code maxMean < minMean}
     */
    public PoissonSamplerCache(double minMean, double maxMean) {

        // Although a mean of 0 is invalid for a Poisson sampler this case
        // is handled to make the cache user friendly. Any low means will
        // be handled by the SmallMeanPoissonSampler and not cached.
        if (minMean < 0) {
            minMean = 0;
        }
        // Allow minMean == maxMean so that the cache can be used across
        // concurrent threads to create samplers with distinct RNGs and the
        // same mean.
        if (maxMean < minMean) {
            throw new IllegalArgumentException(
                    "Max mean: " + maxMean + " < " + minMean);
        }

        // The cache can only be used for the LargeMeanPoissonSampler.
        if (maxMean < WrapperPoissonSampler.PIVOT) {
            // The upper limit is too small so no cache will be used.
            // This class will just construct new samplers.
            minN = 0;
            maxN = 0;
            values = null;
        } else {
            // Convert the mean into integers.
            // Note the minimum is clipped to the algorithm switch point.
            this.minN = (int) Math
                    .floor(Math.max(minMean, WrapperPoissonSampler.PIVOT));
            this.maxN = (int) Math.floor(maxMean);
            values = new AtomicReferenceArray<>(maxN - minN + 1);
        }
    }

    /**
     * Creates a Poisson sampler. The returned sampler will function exactly the
     * same as
     * {@link org.apache.commons.rng.sampling.distribution#PoissonSampler(UniformRandomProvider,
double)}.
     * <p>
     * A value of {@code mean} outside the range of the cache is valid.
     * 
     * @param rng  Generator of uniformly distributed random numbers.
     * @param mean Mean.
     * @return A Poisson sampler
     * @throws IllegalArgumentException if {@code mean <= 0}.
     */
    public DiscreteSampler getPoissonSampler(UniformRandomProvider rng,
            double mean) {
        // Ensure the same functionality as the PoissonSampler by
        // using a SmallMeanPoissonSampler under the switch point.
        if (mean < WrapperPoissonSampler.PIVOT)
            return new SmallMeanPoissonSampler(rng, mean);

        // Convert the mean into an integer.
        final int n = (int) Math.floor(mean);
        if (n > maxN)
            // Outside the range of the cache.
            return new LargeMeanPoissonSampler(rng, mean);

        // Look in the cache for a state that can be reused.
        // Note: The cache is offset by minN.
        final int index = n - minN;
        LargeMeanPoissonSamplerState state = values.get(index);
        if (state == null) {
            // Compute and store for reuse
            state = LargeMeanPoissonSamplerState.create(n);
            // Only set this once as it will be the same.
            // Allows concurrent threads to compute the state without
            // excess synchronisation once the state is stored.
            values.compareAndSet(index, null, state);
        }
        // Compute the remaining fraction of the mean
        final double lambdaFractional = mean - n;
        return new LargeMeanPoissonSampler(rng, state, lambdaFractional);
    }
}
{code}
I've left out the implementation of the {{LargeMeanPoissonSamplerState}}. Suffice to say that
the {{LargeMeanPoissonSampler}} computes it and it is a nested class. This maintains all the
functionality of the {{LargeMeanPoissonSampler}} in one place.

 

Q. Should I pursue this addition?

 

 

> PoissonSampler single use speed improvements
> --------------------------------------------
>
>                 Key: RNG-50
>                 URL: https://issues.apache.org/jira/browse/RNG-50
>             Project: Commons RNG
>          Issue Type: Improvement
>    Affects Versions: 1.0
>            Reporter: Alex D Herbert
>            Priority: Minor
>         Attachments: PoissonSamplerTest.java, jmh-result.csv
>
>
> The Sampler architecture of {{org.apache.commons.rng.sampling.distribution}} is nicely
written for fast sampling of small dataset sizes. The constructors for the samplers do not
check the input parameters are valid for the respective distributions (in contrast to the
old {{org.apache.commons.math3.random.distribution}} classes). I assume this is a design choice
for speed. Thus most of the samplers can be used within a loop to sample just one value with
very little overhead.
> The {{PoissonSampler}} precomputes log factorial numbers upon construction if the mean
is above 40. This is done using the {{InternalUtils.FactorialLog}} class. As of version 1.0
this internal class is currently only used in the {{PoissonSampler}}.
> The cache size is limited to 2*PIVOT (where PIVOT=40). But it creates and precomputes
the cache every time a PoissonSampler is constructed if the mean is above the PIVOT value.
> Why not create this once in a static block for the PoissonSampler?
> {code:java}
> /** {@code log(n!)}. */
> private static final FactorialLog factorialLog;
>      
> static 
> {
>     factorialLog = FactorialLog.create().withCache((int) (2 * PoissonSampler.PIVOT));
> }
> {code}
> This will make the construction cost of a new {{PoissonSampler}} negligible. If the table
is computed dynamically as a static construction method then the overhead will be in the first
use. Thus the following call will be much faster:
> {code:java}
> UniformRandomProvider rng = ...;
> int value = new PoissonSampler(rng, 50).sample();
> {code}
> I have tested this modification (see attached file) and the results are:
> {noformat}
> Mean 40  Single construction ( 7330792) vs Loop construction                        
 (24334724)   (3.319522.2x faster)
> Mean 40  Single construction ( 7330792) vs Loop construction with static FactorialLog
( 7990656)   (1.090013.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction                        
 (19389026)   (3.034132.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction with static FactorialLog
( 6146556)   (0.961857.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction                        
 (21337678)   (3.532047.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction with static FactorialLog
( 5329129)   (0.882136.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction                        
 (23963516)   (3.951765.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction with static FactorialLog
( 5306081)   (0.875013.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction                        
 (26381365)   (4.349935.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction with static FactorialLog
( 6341274)   (1.045591.2x faster)
> {noformat}
> Thus the speed improvements would be approximately 3-4 fold for single use Poisson sampling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message