commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex D Herbert (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (RNG-50) PoissonSampler single use speed improvements
Date Wed, 01 Aug 2018 12:36:00 GMT

    [ https://issues.apache.org/jira/browse/RNG-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565206#comment-16565206
] 

Alex D Herbert edited comment on RNG-50 at 8/1/18 12:35 PM:
------------------------------------------------------------

{quote}However what to do with your use-case (large mean single use)?
{quote}
We know that when the mean is between 40 and 80 then the {{log(n!)}} is used approximately
0.7 to 0.3% of the time within the algorithm loop. When the mean is higher this usage within
the algorithm drops off.

The log gamma function used to compute {{log(n!)}} uses some double computations in a loop
of size 15 and then 2 {{Math.log}} calls. A cache of just the {{log(n!)}} values is unlikely
to make much of a speed difference inside the loop as there a fair few other things happening
including Gaussian and Exponential sampling using the RNG. (But of course a JMH benchmark
would avoid presumptions.)

However it will also be used once per sample at the value of {{(int)mean}} at the initialisation
of the algorithm.

The other pre-computed values for the Poisson sample use 2 {{Math.exp}}, 2 {{Math.log}} calls
and 2 {{Math.sqrt}} calls. All of these values are based purely on the integer value of the
mean.

So there may be some value in passing in a cache of not only {{log(n!)}} but also the entire
initialisation state of the LargeMeanPoissonSampler if the range of the mean is known.

The cache could perform lazy initialisation something like:
{code:java}
package org.apache.commons.rng.sampling.distribution;

import org.apache.commons.rng.sampling.distribution.InternalUtils.FactorialLog;

/**
 * Compute initialisation state for the LargeMeanPoissonSampler caching values
 * in a set range.
 */
public class LargeMeanPoissonSamplerCache {

    /** Class to compute {@code log(n!)}. This has no cached values. */
    private static final InternalUtils.FactorialLog NO_CACHE_FACTORIAL_LOG;

    static {
        NO_CACHE_FACTORIAL_LOG = FactorialLog.create();
    }

    /** The minimum N covered by the cache. */
    private final int minN;
    /** The maximum N covered by the cache. */
    private final int maxN;
    /** The cache of initialisation states between {@link minN} and {@link maxN}. */
    private final double[][] values;

    /**
     * @param minN The minimum N covered by the cache.
     * @param maxN The maximum N covered by the cache.
     * @throws IllegalArgumentException if {@code mean <= 0}.
     */
    public LargeMeanPoissonSamplerCache(int minN, int maxN) {
        if (minN < 0) {
            throw new IllegalArgumentException("MinN: " + minN + " <= " + 0);
        }
        if (maxN <= minN) {
            throw new IllegalArgumentException("MaxN: " + maxN + " <= " + minN);
        }
        this.minN = minN;
        this.maxN = maxN;
        values = new double[maxN - minN + 1][];
    }

    /**
     * Get the initialisation state of the LargeMeanPoissonSampler.
     *
     * @param lambda The integer value of the Poisson mean 
     *               ({@code Math.floor(mean)}).
     * @return the initialisation state
     */
    double[] getState(int lambda) {
        if (lambda < minN || lambda > maxN)
            return computeState(lambda);
        final int index = lambda - minN;
        double[] value = values[index];
        if (value == null) {
            // Compute and store for reuse
            value = computeState(lambda);
            values[index] = value;
        }
        return value;
    }
    
    /**
     * Compute the initialisation state of the LargeMeanPoissonSampler.
     *
     * @param lambda The integer value of the Poisson mean ({@code Math.floor(mean)}).
     * @return the initialisation state
     */
    private final static double[] computeState(int lambda) {
        final double logLambda = Math.log(lambda);
        final double logLambdaFactorial = NO_CACHE_FACTORIAL_LOG.value(lambda);
        final double delta = Math.sqrt(lambda * Math.log(32 * lambda / Math.PI + 1));
        final double halfDelta = delta / 2;
        final double twolpd = 2 * lambda + delta;
        final double c1 = 1 / (8 * lambda);
        final double a1 = Math.sqrt(Math.PI * twolpd) * Math.exp(c1);
        final double a2 = (twolpd / delta) * Math.exp(-delta * (1 + delta) / twolpd);
        final double aSum = a1 + a2 + 1;
        final double p1 = a1 / aSum;
        final double p2 = a2 / aSum;
        return new double[] { logLambda, logLambdaFactorial, delta, halfDelta, twolpd, p1,
p2, c1 };
    }
}
{code}
This was a simple example using {{double[]}} but the state could be wrapped as an immutable
object.

The LargeMeanPoissonSampler would then have to do this for initialisation:
{code:java}
lambda = Math.floor(mean);
initialise(cache.getState((int) lambda);
{code}


was (Author: alexherbert):
{quote}However what to do with your use-case (large mean single use)?
{quote}
We know that when the mean is between 40 and 80 then the {{log(n!)}} is used approximately
0.7 to 0.3% of the time within the algorithm loop. When the mean is higher this usage within
the algorithm drops off.

The log gamma function used to compute {{log(n!)}} uses some double computations in a loop
of size 15 and then 2 {{Math.log}} calls. A cache of just the {{log(n!)}} values is unlikely
to make much of a speed difference inside the loop as there a fair few other things happening
including Gaussian and Exponential sampling using the RNG. (But of course a JMH benchmark
would avoid presumptions.)

However it will also be used once per sample at the value of {{(int)mean}} at the initialisation
of the algorithm.

The other pre-computed values for the Poisson sample use 2 {{Math.exp}}, 2 {{Math.log}} calls
and 2 {{Math.sqrt}} calls. All of these values are based purely on the integer value of the
mean.

So there may be some value in passing in a cache of not only {{log(n!)}} but also the entire
initialisation state of the LargeMeanPoissonSampler if the range of the mean is known.

The cache could perform lazy initialisation something like:
{code:java}
package org.apache.commons.rng.sampling.distribution;

import org.apache.commons.rng.sampling.distribution.InternalUtils.FactorialLog;

/**
 * Compute initialisation state for the LargeMeanPoissonSampler caching values
 * in a set range.
 */
public class LargeMeanPoissonSamplerCache {

    /** Class to compute {@code log(n!)}. This has no cached values. */
    private static final InternalUtils.FactorialLog NO_CACHE_FACTORIAL_LOG;

    static {
        NO_CACHE_FACTORIAL_LOG = FactorialLog.create();
    }

    /** The minimum N covered by the cache. */
    private final int minN;
    /** The maximum N covered by the cache. */
    private final int maxN;
    /** The cache of initialisation states between {@link minN} and {@link maxN}. */
    private final double[][] values;

    /**
     * @param minN The minimum N covered by the cache.
     * @param maxN The maximum N covered by the cache.
     * @throws IllegalArgumentException if {@code mean <= 0}.
     */
    public LargeMeanPoissonSamplerCache(int minN, int maxN) {
        if (minN < 0) {
            throw new IllegalArgumentException("MinN: " + minN + " <= " + 0);
        }
        if (maxN <= minN) {
            throw new IllegalArgumentException("MaxN: " + maxN + " <= " + minN);
        }
        this.minN = minN;
        this.maxN = maxN;
        values = new double[maxN - minN + 1][];
    }

    /**
     * Get the initialisation state of the LargeMeanPoissonSampler.
     *
     * @param lambda The integer value of the Poisson mean ({@code Math.floor(mean)}).
     * @return the initialisation state
     */
    double[] getState(int lambda) {
        if (lambda < minN || lambda > maxN)
            return computeState(lambda);
        final int index = lambda - minN;
        double[] value = values[index];
        if (value == null) {
            // Compute and store for reuse
            value = computeState(lambda);
            values[index] = value;
        }
        return value;
    }
    
    /**
     * Compute the initialisation state of the LargeMeanPoissonSampler.
     *
     * @param lambda The integer value of the Poisson mean ({@code Math.floor(mean)}).
     * @return the initialisation state
     */
    private final static double[] computeState(int lambda) {
        final double logLambda = Math.log(lambda);
        final double logLambdaFactorial = NO_CACHE_FACTORIAL_LOG.value(lambda);
        final double delta = Math.sqrt(lambda * Math.log(32 * lambda / Math.PI + 1));
        final double halfDelta = delta / 2;
        final double twolpd = 2 * lambda + delta;
        final double c1 = 1 / (8 * lambda);
        final double a1 = Math.sqrt(Math.PI * twolpd) * Math.exp(c1);
        final double a2 = (twolpd / delta) * Math.exp(-delta * (1 + delta) / twolpd);
        final double aSum = a1 + a2 + 1;
        final double p1 = a1 / aSum;
        final double p2 = a2 / aSum;
        return new double[] { logLambda, logLambdaFactorial, delta, halfDelta, twolpd, p1,
p2, c1 };
    }
}
{code}

This was a simple example using {{double[]}} but the state could be wrapped as an immutable
object.

The LargeMeanPoissonSampler would then have to do this for initialisation:

{code:java}
lambda = Math.floor(mean);
initialise(cache.getState((int) lambda);
{code}

> PoissonSampler single use speed improvements
> --------------------------------------------
>
>                 Key: RNG-50
>                 URL: https://issues.apache.org/jira/browse/RNG-50
>             Project: Commons RNG
>          Issue Type: Improvement
>    Affects Versions: 1.0
>            Reporter: Alex D Herbert
>            Priority: Minor
>         Attachments: PoissonSamplerTest.java, jmh-result.csv
>
>
> The Sampler architecture of {{org.apache.commons.rng.sampling.distribution}} is nicely
written for fast sampling of small dataset sizes. The constructors for the samplers do not
check the input parameters are valid for the respective distributions (in contrast to the
old {{org.apache.commons.math3.random.distribution}} classes). I assume this is a design choice
for speed. Thus most of the samplers can be used within a loop to sample just one value with
very little overhead.
> The {{PoissonSampler}} precomputes log factorial numbers upon construction if the mean
is above 40. This is done using the {{InternalUtils.FactorialLog}} class. As of version 1.0
this internal class is currently only used in the {{PoissonSampler}}.
> The cache size is limited to 2*PIVOT (where PIVOT=40). But it creates and precomputes
the cache every time a PoissonSampler is constructed if the mean is above the PIVOT value.
> Why not create this once in a static block for the PoissonSampler?
> {code:java}
> /** {@code log(n!)}. */
> private static final FactorialLog factorialLog;
>      
> static 
> {
>     factorialLog = FactorialLog.create().withCache((int) (2 * PoissonSampler.PIVOT));
> }
> {code}
> This will make the construction cost of a new {{PoissonSampler}} negligible. If the table
is computed dynamically as a static construction method then the overhead will be in the first
use. Thus the following call will be much faster:
> {code:java}
> UniformRandomProvider rng = ...;
> int value = new PoissonSampler(rng, 50).sample();
> {code}
> I have tested this modification (see attached file) and the results are:
> {noformat}
> Mean 40  Single construction ( 7330792) vs Loop construction                        
 (24334724)   (3.319522.2x faster)
> Mean 40  Single construction ( 7330792) vs Loop construction with static FactorialLog
( 7990656)   (1.090013.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction                        
 (19389026)   (3.034132.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction with static FactorialLog
( 6146556)   (0.961857.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction                        
 (21337678)   (3.532047.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction with static FactorialLog
( 5329129)   (0.882136.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction                        
 (23963516)   (3.951765.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction with static FactorialLog
( 5306081)   (0.875013.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction                        
 (26381365)   (4.349935.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction with static FactorialLog
( 6341274)   (1.045591.2x faster)
> {noformat}
> Thus the speed improvements would be approximately 3-4 fold for single use Poisson sampling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message