From issuesreturn73547apmailcommonsissuesarchive=commons.apache.org@commons.apache.org Wed Apr 17 09:15:03 2019
ReturnPath:
XOriginalTo: apmailcommonsissuesarchive@minotaur.apache.org
DeliveredTo: apmailcommonsissuesarchive@minotaur.apache.org
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
by minotaur.apache.org (Postfix) with SMTP id 133071901D
for ; Wed, 17 Apr 2019 09:15:02 +0000 (UTC)
Received: (qmail 76273 invoked by uid 500); 17 Apr 2019 09:15:02 0000
DeliveredTo: apmailcommonsissuesarchive@commons.apache.org
Received: (qmail 76138 invoked by uid 500); 17 Apr 2019 09:15:02 0000
MailingList: contact issueshelp@commons.apache.org; run by ezmlm
Precedence: bulk
ListHelp:
ListUnsubscribe:
ListPost:
ListId:
ReplyTo: issues@commons.apache.org
DeliveredTo: mailing list issues@commons.apache.org
Received: (qmail 76090 invoked by uid 99); 17 Apr 2019 09:15:02 0000
Received: from mailrelay1uswest.apache.org (HELO mailrelay1uswest.apache.org) (209.188.14.139)
by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Apr 2019 09:15:02 +0000
Received: from jiralwus.apache.org (unknown [207.244.88.139])
by mailrelay1uswest.apache.org (ASF Mail Server at mailrelay1uswest.apache.org) with ESMTP id EB4D7E2AAD
for ; Wed, 17 Apr 2019 09:15:00 +0000 (UTC)
Received: from jiralwus.apache.org (localhost [127.0.0.1])
by jiralwus.apache.org (ASF Mail Server at jiralwus.apache.org) with ESMTP id 5B1502459E
for ; Wed, 17 Apr 2019 09:15:00 +0000 (UTC)
Date: Wed, 17 Apr 2019 09:15:00 +0000 (UTC)
From: "Alex D Herbert (JIRA)"
To: issues@commons.apache.org
MessageID:
InReplyTo:
References:
Subject: [jira] [Commented] (RNG91) Kemp small mean poisson sampler
MIMEVersion: 1.0
ContentType: text/plain; charset=utf8
ContentTransferEncoding: quotedprintable
XJIRAFingerPrint: 30527f35849b9dde25b450d4833f0394
[ https://issues.apache.org/jira/browse/RNG91?page=3Dcom.atlassian.jir=
a.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=3D16819878=
#comment16819878 ]=20
Alex D Herbert commented on RNG91:

Original timings above were using the Kemp method that effectively computes=
the cumulative probability of the Poisson distribution until the cumulativ=
e probability is above the value of the random deviate.
The cumulative probability is computed *for each sample*. This is a lot of =
repeat computation.
However it should be noted that the value of the random deviate will be abo=
ve 0.5 half of the time. Therefore for repeat sampling it is worth computin=
g and storing the cumulative probability and the sample value that correspo=
nds to approximately 0.5 in the cumulative probability function. This will =
be known as the 50% hedge value.
This will *save half of the computation*=C2=A0for *half of the samples*. Th=
is is expected to produce a speed increase of 50% and so *reduce runtime to=
2/3*=C2=A0of that listed above.
I have tested an implementation that uses this 50% hedge.
Here is an output of the raw data for various means using the original Kemp=
algorithm and the modified Kemp algorithm. The algorithm is compared to th=
e current Small and Large mean Poisson sampler.
Note that the XO_RO_SHI_RO_128_PLUS generator is the fastest currently in t=
he library for double floating point generation. The Small mean sampler is =
very dependent on the speed of the generator so a slow algorithm is once ag=
ain included to demonstrate this (WELL_44497B).
=C2=A0Small=C2=A0=C2=A0OriginalKemp=C2=A0=C2=A0Kemp=C2=A0=
=C2=A0Large=C2=A0=C2=A0
meanXoSplitWellXoSplitWellXoSplitWellXoSplitWell
0.258.6310.0330.147.879.0722.168.139.0422.04=C2=A0=C2=A0=C2=A0=

0.512.3114.6037.7211.3413.0725.4211.0612.3925.09=C2=A0=C2=A0=
=C2=A0
116.7119.9249.5315.2017.2529.5913.5815.4128.0491.78100.64203.5=
7
221.3723.1467.8220.6421.0036.4016.7318.7031.3888.5197.72193.22=

425.9428.17101.9326.9127.2243.4621.5223.5139.2481.6788.89172.7=
4
835.4937.44167.9444.2644.5860.7233.0436.9149.2172.9679.27145.8=
4
1652.6155.00300.2378.5579.2894.8153.5855.0368.8664.8969.99134.=
86
3287.1790.55562.73147.19147.15162.9290.8692.52107.1059.9462.67=
146.87
Sorry for the large table width but the raw data allows a lot of analysis t=
o be done. Algorithms are:
* Split =3D SPLIT_MIX_64
* Xo =3D=C2=A0XO_RO_SHI_RO_128_PLUS
* Well =3D=C2=A0WELL_44497_B
The LargeMean sampler is invalid at a mean below 1.
*Analysis*
The new hedge computation is very similar to the original Kemp sampler at v=
ery low mean. The added complexity of the algorithm cancels the speed incre=
ase of the 50% hedge. It should also be noted that at a mean of 0.25 the cu=
mulative probability of a Poisson distribution is 0.779 for n=3D0. This hig=
h skew in the distribution effectively means the hedge is rarely used and t=
he complexity of the algorithm is a waste.
When the mean is large the new 50% hedge algorithm does show the approximat=
e runtime at 2/3 of the original Kemp algorithm. So the hedge algorithm is=
working as expected,
When the mean is in the range 0.5 to 8 the new hedged Kemp sampler is the f=
astest.
Only at mean 16 and 32 is the current Small mean sampler faster and this is=
by less then 5% and requires a very fast generator. When the generator is =
slow the new Kemp sampler is faster by 5fold due to the small sampler requ=
iring multiple double values from the RNG per poisson sample.
The Large mean sampler is currently used in the library at mean 40. This ta=
ble shows that with a fast RNG the sampler starts outperforming the small s=
amplers around a mean of 32. When the sampler is slow it outperforms the sm=
all mean sampler at mean 8 but is still slower than the new Kemp sampler wh=
en the mean is 32.
Note: The bound of 40 on the large mean poisson sampler may not be based on=
performance and instead on the validity of the algorithm. It is based arou=
nd approximating the poisson distribution using a normal distribution with =
handling of the skew when the mean is low. This sampler may not even be val=
id for very low means and the current units tests=C2=A0use a=C2=A0mean of 6=
7.89 to test the distribution.=C2=A0
This is perhaps best illustrated with a chart (this omits the original Kemp=
sampler and only includes the new Kemp sampler with the 50% hedge):
!poissonsamplers.jpg!
The new code for the Kemp sampler and the performance test is submitted in =
a PR.
A recommendation would be to:
* Switch the code to use the Kemp sampler instead of the Small mean poisso=
n sampler. Speed is approximately equivalent with the fastest generators in=
the library and is much better when the generator is slow as only 1 double=
from the RNG is used per sample.
* Maintain the switch point=C2=A0of 40 for the change to the large mean po=
isson sampler. This is a valid point on the basis of performance and may be=
imposed anyway by the details of the algorithm
=C2=A0
> Kemp small mean poisson sampler
> 
>
> Key: RNG91
> URL: https://issues.apache.org/jira/browse/RNG91
> Project: Commons RNG
> Issue Type: New Feature
> Components: sampling
> Affects Versions: 1.3
> Reporter: Alex D Herbert
> Assignee: Alex D Herbert
> Priority: Minor
> Fix For: 1.3
>
> Attachments: kemp.jpg, poissonsamplers.jpg
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> The current algorithm for the {{SmallMeanPoissonSampler}} is used to gene=
rate Poisson samples for any mean up to 40. The sampler requires approximat=
ely n samples from a RNG to generate 1 Poisson deviate for a mean of n.
> The [Kemp (1981)https://www.jstor.org/stable/2346348]=C2=A0algorithm req=
uires 1 sample from the RNG and then accrues a cumulative probability using=
a recurrence relation to compute each successive Poisson probability:
> {noformat}
> p(n+1) =3D p(n) * mean / (n+1)
> {noformat}
> The full algorithm is here:
> {code:java}
> mean =3D ...;
> final double p0 =3D Math.exp(mean);
> @Override
> public int sample() {
> double u =3D rng.nextDouble();
> int x =3D 0;
> double p =3D p0;
> // The algorithm listed in Kemp (1981) does not check that the ro=
lling probability
> // is positive. This check is added to ensure no errors when the =
limit of the summation
> // 1  sum(p(x)) is above 0 due to cumulative error in floating p=
oint arithmetic.
> while (u > p && p !=3D 0) {
> u =3D p;
> x++;
> p =3D p * mean / x;
> }
> return x;
> }
> {code}
> The limit for the sampler is set by the ability to compute p0. This is ap=
proximately 744.440 when Math.exp(mean) returns 0.
> A conservative limit of 700 sets an initial probability p0 of 9.859676543=
75977E305. When run through the summation series for the limit (u initiali=
sed to 1) the result when the summation ends (p is zero) leaves u =3D 3.335=
439283623915E15. This is close to the expected tolerance for floating poin=
t error (Note: 1=C2=A0 Math.nextDown(1) =3D 1.1102230246251565E16).
> Using a mean of 10 leaves u =3D 4.988586742717954E17. So smaller means h=
ave a similar error. The error in the cumulative sum is expected to result =
in truncation of the long tail of the Poisson distribution (which should be=
bounded at infinity).
> This sampler should outperform the current {{SmallMeanPoissonSampler}} as=
it requires 1 uniform deviate per sample.
> Note that the \{[SmallMeanPoissonSampler}} uses a limit for the mean of I=
nteger.MAX_VALUE / 2. This should be updated since it also relies on p0 bei=
ng above zero.

This message was sent by Atlassian JIRA
(v7.6.3#76005)