commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex D Herbert (JIRA)" <>
Subject [jira] [Created] (RNG-52) PoissonSampler allows mean above Integer.MAX_VALUE
Date Tue, 07 Aug 2018 20:32:00 GMT
Alex D Herbert created RNG-52:

             Summary: PoissonSampler allows mean above Integer.MAX_VALUE
                 Key: RNG-52
             Project: Commons RNG
          Issue Type: Bug
    Affects Versions: 1.1
            Reporter: Alex D Herbert

The {{PoissonSampler}} is limited to returning an integer by the interface of the {{DiscreteSampler}}.
As it stands an input mean above {{Integer.MAX_VALUE}} is allowed although it makes no sense
as the Poisson distribution is significantly truncated.

The algorithm of the {{SmallMeanPoissonSampler}} sets a limit on the returned sample of {{Integer.MAX_VALUE}}.
The algorithm is valid although run-time would be impractical due to the nature of the algorithm.
However at high mean (>40) the end user is expected to use either the {{LargeMeanPoissonSampler}}
directly or the {{PoissonSampler}} which chooses the appropriate large mean algorithm.

However the current {{LargeMeanPoissonSampler}} uses {{(int)Math.floor(mean)}} during initialisation
and any mean above {{Integer.MAX_VALUE}} would therefore be unsupported.

I propose to add this to the constructor of each Poisson sampler:
if (mean > Integer.MAX_VALUE) {
    throw new IllegalArgumentException(mean + " > " + Integer.MAX_VALUE);
with documentation
 * @throws IllegalArgumentException if {@code mean <= 0} or {@code mean > }{@link Integer.MAX_VALUE}.
It is noted that the limit of {{Integer.MAX_VALUE}} would allow the samples to reflect the
Poisson distribution below that level but truncate it above that level to represent the remaining
cumulative histogram at the single point of {{Integer.MAX_VALUE}}. This maintains the functionality
of the sampler within the contract of the integer value returned by {{DiscreteSampler}}.

In practice the Poisson distribution is unlikely to be used at such a high mean; in this case
it is appropriate to use a Gaussian approximation to the Poisson.

Note: Currently there is no code coverage from tests for the \{{LargeMeanPoissonSampler}}
checking if the mean is <= 0. Tests should be added to check the constructor does throw
when a bad mean is used.

This message was sent by Atlassian JIRA

View raw message