commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex D Herbert (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (RNG-91) Kemp small mean poisson sampler
Date Fri, 12 Apr 2019 16:21:00 GMT

    [ https://issues.apache.org/jira/browse/RNG-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816415#comment-16816415
] 

Alex D Herbert commented on RNG-91:
-----------------------------------

Here are some JMH timing results:
       |mean|randomSourceName|samplerType|Method|Score|Error|Median|Runtime|
| | | |baseline|2.045|0.004|2.045| |
|1|SPLIT_MIX_64|KempSmallMeanPoissonSampler|sample|17.155|0.244|17.143|15.098|
|1|SPLIT_MIX_64|SmallMeanPoissonSampler|sample|19.586|0.169|19.580|17.535|
|2|SPLIT_MIX_64|KempSmallMeanPoissonSampler|sample|22.167|0.148|22.158|20.113|
|2|SPLIT_MIX_64|SmallMeanPoissonSampler|sample|24.256|0.138|24.264|22.219|
|4|SPLIT_MIX_64|KempSmallMeanPoissonSampler|sample|32.197|0.242|32.178|30.132|
|4|SPLIT_MIX_64|SmallMeanPoissonSampler|sample|28.165|0.229|28.171|26.126|
|8|SPLIT_MIX_64|KempSmallMeanPoissonSampler|sample|56.959|0.109|56.962|54.917|
|8|SPLIT_MIX_64|SmallMeanPoissonSampler|sample|39.533|17.959|37.429|35.384|
|16|SPLIT_MIX_64|KempSmallMeanPoissonSampler|sample|107.836|0.157|107.832|105.787|
|16|SPLIT_MIX_64|SmallMeanPoissonSampler|sample|58.023|26.622|54.882|52.837|
|32|SPLIT_MIX_64|KempSmallMeanPoissonSampler|sample|211.423|0.365|211.453|209.408|
|32|SPLIT_MIX_64|SmallMeanPoissonSampler|sample|89.935|0.379|89.900|87.855|
|1|WELL_44497_B|KempSmallMeanPoissonSampler|sample|31.937|0.222|31.916|29.871|
|1|WELL_44497_B|SmallMeanPoissonSampler|sample|49.306|0.613|49.256|47.211|
|2|WELL_44497_B|KempSmallMeanPoissonSampler|sample|36.735|0.212|36.735|34.690|
|2|WELL_44497_B|SmallMeanPoissonSampler|sample|67.669|0.553|67.615|65.570|
|4|WELL_44497_B|KempSmallMeanPoissonSampler|sample|47.596|1.532|47.414|45.369|
|4|WELL_44497_B|SmallMeanPoissonSampler|sample|101.487|0.586|101.515|99.470|
|8|WELL_44497_B|KempSmallMeanPoissonSampler|sample|72.094|0.321|72.124|70.079|
|8|WELL_44497_B|SmallMeanPoissonSampler|sample|167.569|0.761|167.561|165.515|
|16|WELL_44497_B|KempSmallMeanPoissonSampler|sample|123.320|0.615|123.283|121.238|
|16|WELL_44497_B|SmallMeanPoissonSampler|sample|298.698|1.452|298.780|296.734|
|32|WELL_44497_B|KempSmallMeanPoissonSampler|sample|226.799|0.678|226.761|224.716|
|32|WELL_44497_B|SmallMeanPoissonSampler|sample|562.260|1.813|562.136|560.091|

!kemp.jpg!

The plot shows the Runtime (which is the median minus the median of the baseline).

For reference here are the two algorithms:

{code:java}
    public int sampleKemp() {
        double u = rng.nextDouble();
        int x = 0;
        double p = p0;
        while (u > p) {
            u -= p;
            p = p * mean / ++x;
            if (p == 0) {
                break;
            }
        }
        return x;
    }

    public int sampleSmallMean() {
        int n = 0;
        double r = 1;
        while (n < limit) {
            r *= rng.nextDouble();
            if (r >= p0) {
                n++;
            } else {
                break;
            }
        }
        return n;
    }
{code}
 
Conclusions:
 * The speed of the RNG is very important for the the {{SmallMean}}. This is obvious as it
uses a uniform deviate within the sample loop. What is not obvious is that with a fast RNG
it can outperform the Kemp method.
 * The Kemp method runtime is largely independent of the RNG. This is obvious as it only uses
a single uniform deviate.
 * The Kemp method always outperforms the SmallMean method when the RNG is slow.
 * The Kemp method always outperforms the SmallMean method when the RNG is fast and the mean
is <=2.
 * The SmallMean method outperforms the Kemp method when the RNG is fast and the mean is above
2.

This creates some dilemmas about what to do with the Kemp method. It is a viable generator
of Poisson samples. Speed will scale the same irrespective of the generator. But is does not
provide maximum performance for the expected use case range of 0 to 40 (Note: above a mean
of 40 the LargeMeanPoissonSampler is recommended).

It may be worth using in the library for means below 1. For example a Poisson sample with
a fractional mean is used in the LargeMeanPoissonSampler.


> Kemp small mean poisson sampler
> -------------------------------
>
>                 Key: RNG-91
>                 URL: https://issues.apache.org/jira/browse/RNG-91
>             Project: Commons RNG
>          Issue Type: New Feature
>          Components: sampling
>    Affects Versions: 1.3
>            Reporter: Alex D Herbert
>            Assignee: Alex D Herbert
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: kemp.jpg
>
>
> The current algorithm for the {{SmallMeanPoissonSampler}} is used to generate Poisson
samples for any mean up to 40. The sampler requires approximately n samples from a RNG to
generate 1 Poisson deviate for a mean of n.
> The [Kemp (1981)|https://www.jstor.org/stable/2346348] algorithm requires 1 sample from
the RNG and then accrues a cumulative probability using a recurrence relation to compute each
successive Poisson probability:
> {noformat}
> p(n+1) = p(n) * mean / (n+1)
> {noformat}
> The full algorithm is here:
> {code:java}
>     mean = ...;
>     final double p0 = Math.exp(-mean);
>     @Override
>     public int sample() {
>         double u = rng.nextDouble();
>         int x = 0;
>         double p = p0;
>         // The algorithm listed in Kemp (1981) does not check that the rolling probability
>         // is positive. This check is added to ensure no errors when the limit of the
summation
>         // 1 - sum(p(x)) is above 0 due to cumulative error in floating point arithmetic.
>         while (u > p && p != 0) {
>             u -= p;
>             x++;
>             p = p * mean / x;
>         }
>         return x;
>     }
> {code}
> The limit for the sampler is set by the ability to compute p0. This is approximately
744.440 when Math.exp(-mean) returns 0.
> A conservative limit of 700 sets an initial probability p0 of 9.85967654375977E-305.
When run through the summation series for the limit (u initialised to 1) the result when the
summation ends (p is zero) leaves u = 3.335439283623915E-15. This is close to the expected
tolerance for floating point error (Note: 1 - Math.nextDown(1) = 1.1102230246251565E-16).
> Using a mean of 10 leaves u = 4.988586742717954E-17. So smaller means have a similar
error. The error in the cumulative sum is expected to result in truncation of the long tail
of the Poisson distribution (which should be bounded at infinity).
> This sampler should outperform the current {{SmallMeanPoissonSampler}} as it requires
1 uniform deviate per sample.
> Note that the \{[SmallMeanPoissonSampler}} uses a limit for the mean of Integer.MAX_VALUE
/ 2. This should be updated since it also relies on p0 being above zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message