[ https://issues.apache.org/jira/browse/RNG50?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=16562006#comment16562006
]
Alex D Herbert edited comment on RNG50 at 7/31/18 11:54 AM:

I've made a simple Maven project that tests the use of the {{log(n!}} function:
[Poisson Sampler Test Codehttps://github.com/aherbert/poissonsamplertestcode]
Since the function outside the algorithm loop can be precomputed I've ignored this from the
statistics. The table below shows the number of calls made to {{log(n!)}} and the summary
statistics on the distribution of {{n}}. For reference the expected standard deviation of
all samples from the Poisson distribution is shown.
MeanSampleslog(n!) callsCalls/samplemin nmax nAv nSD nPoisson SD
40200000015336 0.00767117742.714.56.3
45200000013343 0.00667138547.715.36.7
50200000011930 0.00597179253.016.07.1
55200000010748 0.005371810057.616.97.4
6020000009723 0.004862410862.817.57.7
6520000009005 0.004503111067.818.38.1
7020000008166 0.004083411872.419.18.4
7520000007694 0.003853812977.519.88.7
8020000006989 0.003493813082.820.38.9
8520000006651 0.003334713488.020.89.2
9020000006314 0.003164714493.021.49.5
9520000005948 0.002974914597.522.19.7
10020000005574 0.0027956151102.822.710.0
20020000002654 0.00133139267203.731.814.1
400200000013160.000658315504403.445.020.0
80020000006570.000329691947803.163.828.3
160020000003360.000168144217761604.786.940.0
320020000001557.75e05297734293210.3124.356.6
64002000000743.70e05608866996378.5174.980.0
128002000000582.90e05123351313512803.8256.0113.1
256002000000199.50e06251022611125542.1358.7160.0
512002000000105.00e06503015189351056.6605.1226.3
102400200000031.50e06101510103388102729.71057.4320.0
Interestingly the section of code that uses {{log(n!)}} is called about 0.7 down to 0.3 %
of the time. So the value of a cache of size 80 is debatable.
Reading the algorithm it is clear that the {{log(n!)}} is called with a value that is approximately
the same as the sample that will be returned (it is just adjusted by a Poisson sample using
a mean in the range of 01). Thus {{log(n!)}} is called with a value from the Poisson distribution.
With high mean (>40) this is approximately a Gaussian and so the value of n will will be
approximately in the range {{mean +/ 3*sqrt(mean)}}, with {{sqrt(mean)}} the expected standard
deviation of the Poisson distribution. In the results the mean is slightly higher than the
Poisson mean but the standard deviation is more than double. This is due the subsample of
the Poisson that is used when calling {{log(n!)}}, i.e. this executes only part of the time
for all samples.
If a cache is to be used then it could be made smarter and only the range of n expected given
the mean should be cached. This will have some value when the user wants to compute thousands
of samples, but the exact crossover point when a cache is beneficial would require usecase
testing.
In light of these results perhaps the cache should be removed or made optional.
Given that a lot of the initial computation at the start of the main algorithm loop is the
same I have created a new {{NoCachePoissonSampler}} that precomputes this. It actually delegates
the work of sampling to an internal class that is specialised for small mean or large mean.
This is less readable than the original but runs about 2x faster for big mean and is comparable
for small mean to the existing version. It has little penalty when run inside a loop for a
single use.
Here is a table showing relative performance to the current {{PoissonSampler}}:
MeanSamplesPoissonSamplerPoissonSampler singleuseRelativeNoCachePoissonSamplerRelativeNoCachePoissonSampler
singleuseRelative
1010000032240642443034791.4297389840.9403920681.3
1510000045116715480126981.1414232790.9487022451.1
2010000054947972596937801.1534291961.0572251591.0
2510000066033864703279371.1564551540.9653890161.0
3010000066540285700809071.1668986431.0704444331.1
3510000076536579793340151.0767216591.0820336731.1
40100000470657161527805893.2205022200.4615739871.3
45100000442344061387327153.1204883910.5598021041.4
50100000464067591520520023.3206841580.4567780321.2
55100000474712861638064883.5203934760.4548807141.2
60100000449029991683879593.8198038730.4530014161.2
65100000443786981779258814.0193341880.4526873381.2
70100000439074771886858224.3192756610.4529637711.2
75100000438150771942232164.4192067280.4556729981.3
80100000437198122085860084.8191259900.4537464171.2
was (Author: alexherbert):
I've made a simple Maven project that tests the use of the {{log(n!}} function:
[Poisson Sampler Test Codehttps://github.com/aherbert/poissonsamplertestcode]
Since the function outside the algorithm loop can be precomputed I've ignored this from the
statistics. The table below shows the number of calls made to {{log(n!)}} and the summary
statistics on the distribution of {{n}}. For reference the expected standard deviation of
all samples from the Poisson distribution is shown.
MeanSampleslog(n!) callsCalls/sampleMin nMax nAv nSD nPoisson SD
402000000153360.007668117742.67638236828377514.474322937502336.324555320336759
452000000133430.0066715138547.66214494491493515.2521183309744656.708203932499369
502000000119300.005965179252.9518860016764515.9896132733665087.0710678118654755
552000000107480.0053741810057.5966691477484216.8944764077927877.416198487095663
60200000097230.00486152410862.77733209914635617.5253509740458647.745966692414834
65200000090050.00450253111067.8280955024986118.25328938862458.06225774829855
70200000081660.0040833411872.4335047759000819.0565709314032158.366600265340756
75200000076940.0038473812977.5128671692227719.753670310232648.660254037844387
80200000069890.00349453813082.7706395764773220.2798654918528428.94427190999916
85200000066510.00332554713487.9697789806044220.82631694587629.219544457292887
90200000063140.0031574714492.970858409882821.382320951823179.486832980505138
95200000059480.0029744914597.5100874243443222.1247756931247549.746794344808963
100200000055740.00278756151102.792967348403322.69992061427753410.0
200200000026540.001327139267203.652599849284131.7518512720329314.142135623730951
400200000013166.58E4315504403.3556231003039444.9848607266093420.0
80020000006573.285E4691947803.127853881278563.84806255385582628.284271247461902
160020000003361.68E4144217761604.660714285714286.9113688427183740.0
320020000001557.75E5297734293210.2903225806454124.3389255713323556.568542494923804
64002000000743.7E5608866996378.472972972973174.8747671650702480.0
128002000000582.9E5123351313512803.775862068966256.02912684516065113.13708498984761
256002000000199.5E6251022611125542.052631578947358.73101555410426160.0
512002000000105.0E6503015189351056.6605.0541206281124226.27416997969522
102400200000031.5E6101510103388102729.666666666671057.375209343386320.0
Interestingly the section of code that uses {{log(n!)}} is called about 0.7 down to 0.3 %
of the time. So the value of a cache of size 80 is debatable.
Reading the algorithm it is clear that the {{log(n!)}} is called with a value that is approximately
the same as the sample that will be returned (it is just adjusted by a Poisson sample using
a mean in the range of 01). Thus {{log(n!)}} is called with a value from the Poisson distribution.
With high mean (>40) this is approximately a Gaussian and so the value of n will will be
approximately in the range {{mean +/ 3*sqrt(mean)}}, with {{sqrt(mean)}} the expected standard
deviation of the Poisson distribution. In the results the mean is slightly higher than the
Poisson mean but the standard deviation is more than double. This is due the subsample of
the Poisson that is used when calling {{log(n!)}}, i.e. this executes only part of the time
for all samples.
If a cache is to be used then it could be made smarter and only the range of n expected given
the mean should be cached. This will have some value when the user wants to compute thousands
of samples, but the exact crossover point when a cache is beneficial would require usecase
testing.
In light of these results perhaps the cache should be removed or made optional.
Given that a lot of the initial computation at the start of the main algorithm loop is the
same I have created a new {{NoCachePoissonSampler}} that precomputes this. It actually delegates
the work of sampling to an internal class that is specialised for small mean or large mean.
This is less readable than the original but runs about 2x faster for big mean and is comparable
for small mean to the existing version. It has little penalty when run inside a loop for a
single use.
{noformat}
Mean 10 Single construction (33017430) vs Loop construction (43150013)
(1.306886.2x faster)
Mean 10 Single construction (33017430) vs Loop construction with no cache (38633093)
(1.170082.2x faster)
Mean 10 Single construction (33017430) vs Single construction with no cache (29106732)
(0.881557.2x faster)
Mean 15 Single construction (44054388) vs Loop construction (49272725)
(1.118452.2x faster)
Mean 15 Single construction (44054388) vs Loop construction with no cache (47975491)
(1.089006.2x faster)
Mean 15 Single construction (44054388) vs Single construction with no cache (40819826)
(0.926578.2x faster)
Mean 20 Single construction (49668097) vs Loop construction (56182968)
(1.131168.2x faster)
Mean 20 Single construction (49668097) vs Loop construction with no cache (52115198)
(1.049269.2x faster)
Mean 20 Single construction (49668097) vs Single construction with no cache (46318785)
(0.932566.2x faster)
Mean 25 Single construction (59471099) vs Loop construction (64330381)
(1.081708.2x faster)
Mean 25 Single construction (59471099) vs Loop construction with no cache (60546587)
(1.018084.2x faster)
Mean 25 Single construction (59471099) vs Single construction with no cache (54591773)
(0.917955.2x faster)
Mean 30 Single construction (68146512) vs Loop construction (74374546)
(1.091392.2x faster)
Mean 30 Single construction (68146512) vs Loop construction with no cache (70451976)
(1.033831.2x faster)
Mean 30 Single construction (68146512) vs Single construction with no cache (63940806)
(0.938284.2x faster)
Mean 35 Single construction (79878953) vs Loop construction (84966196)
(1.063687.2x faster)
Mean 35 Single construction (79878953) vs Loop construction with no cache (87798333)
(1.099142.2x faster)
Mean 35 Single construction (79878953) vs Single construction with no cache (89916831)
(1.125664.2x faster)
Mean 40 Single construction (44851418) vs Loop construction (150102896)
(3.346670.2x faster)
Mean 40 Single construction (44851418) vs Loop construction with no cache (64463988)
(1.437279.2x faster)
Mean 40 Single construction (44851418) vs Single construction with no cache (20547465)
(0.458123.2x faster)
Mean 45 Single construction (44565349) vs Loop construction (142062133)
(3.187726.2x faster)
Mean 45 Single construction (44565349) vs Loop construction with no cache (56858335)
(1.275842.2x faster)
Mean 45 Single construction (44565349) vs Single construction with no cache (20147275)
(0.452084.2x faster)
Mean 50 Single construction (43771462) vs Loop construction (146827129)
(3.354403.2x faster)
Mean 50 Single construction (43771462) vs Loop construction with no cache (53001835)
(1.210877.2x faster)
Mean 50 Single construction (43771462) vs Single construction with no cache (19761750)
(0.451476.2x faster)
Mean 55 Single construction (43445931) vs Loop construction (163112456)
(3.754378.2x faster)
Mean 55 Single construction (43445931) vs Loop construction with no cache (52989752)
(1.219671.2x faster)
Mean 55 Single construction (43445931) vs Single construction with no cache (19555998)
(0.450123.2x faster)
Mean 60 Single construction (43694078) vs Loop construction (166955616)
(3.821012.2x faster)
Mean 60 Single construction (43694078) vs Loop construction with no cache (53799074)
(1.231267.2x faster)
Mean 60 Single construction (43694078) vs Single construction with no cache (19645395)
(0.449612.2x faster)
Mean 65 Single construction (43766713) vs Loop construction (179079936)
(4.091693.2x faster)
Mean 65 Single construction (43766713) vs Loop construction with no cache (55747299)
(1.273737.2x faster)
Mean 65 Single construction (43766713) vs Single construction with no cache (20493758)
(0.468250.2x faster)
Mean 70 Single construction (43975538) vs Loop construction (189005650)
(4.297972.2x faster)
Mean 70 Single construction (43975538) vs Loop construction with no cache (53124058)
(1.208037.2x faster)
Mean 70 Single construction (43975538) vs Single construction with no cache (19395643)
(0.441055.2x faster)
Mean 75 Single construction (45913218) vs Loop construction (205133974)
(4.467863.2x faster)
Mean 75 Single construction (45913218) vs Loop construction with no cache (55565851)
(1.210236.2x faster)
Mean 75 Single construction (45913218) vs Single construction with no cache (19380347)
(0.422108.2x faster)
Mean 80 Single construction (43494472) vs Loop construction (204741747)
(4.707305.2x faster)
Mean 80 Single construction (43494472) vs Loop construction with no cache (53357258)
(1.226760.2x faster)
Mean 80 Single construction (43494472) vs Single construction with no cache (19207700)
(0.441612.2x faster)
{noformat}
> PoissonSampler single use speed improvements
> 
>
> Key: RNG50
> URL: https://issues.apache.org/jira/browse/RNG50
> Project: Commons RNG
> Issue Type: Improvement
> Affects Versions: 1.0
> Reporter: Alex D Herbert
> Priority: Minor
> Attachments: PoissonSamplerTest.java
>
>
> The Sampler architecture of {{org.apache.commons.rng.sampling.distribution}} is nicely
written for fast sampling of small dataset sizes. The constructors for the samplers do not
check the input parameters are valid for the respective distributions (in contrast to the
old {{org.apache.commons.math3.random.distribution}} classes). I assume this is a design choice
for speed. Thus most of the samplers can be used within a loop to sample just one value with
very little overhead.
> The {{PoissonSampler}} precomputes log factorial numbers upon construction if the mean
is above 40. This is done using the {{InternalUtils.FactorialLog}} class. As of version 1.0
this internal class is currently only used in the {{PoissonSampler}}.
> The cache size is limited to 2*PIVOT (where PIVOT=40). But it creates and precomputes
the cache every time a PoissonSampler is constructed if the mean is above the PIVOT value.
> Why not create this once in a static block for the PoissonSampler?
> {code:java}
> /** {@code log(n!)}. */
> private static final FactorialLog factorialLog;
>
> static
> {
> factorialLog = FactorialLog.create().withCache((int) (2 * PoissonSampler.PIVOT));
> }
> {code}
> This will make the construction cost of a new {{PoissonSampler}} negligible. If the table
is computed dynamically as a static construction method then the overhead will be in the first
use. Thus the following call will be much faster:
> {code:java}
> UniformRandomProvider rng = ...;
> int value = new PoissonSampler(rng, 50).sample();
> {code}
> I have tested this modification (see attached file) and the results are:
> {noformat}
> Mean 40 Single construction ( 7330792) vs Loop construction
(24334724) (3.319522.2x faster)
> Mean 40 Single construction ( 7330792) vs Loop construction with static FactorialLog
( 7990656) (1.090013.2x faster)
> Mean 50 Single construction ( 6390303) vs Loop construction
(19389026) (3.034132.2x faster)
> Mean 50 Single construction ( 6390303) vs Loop construction with static FactorialLog
( 6146556) (0.961857.2x faster)
> Mean 60 Single construction ( 6041165) vs Loop construction
(21337678) (3.532047.2x faster)
> Mean 60 Single construction ( 6041165) vs Loop construction with static FactorialLog
( 5329129) (0.882136.2x faster)
> Mean 70 Single construction ( 6064003) vs Loop construction
(23963516) (3.951765.2x faster)
> Mean 70 Single construction ( 6064003) vs Loop construction with static FactorialLog
( 5306081) (0.875013.2x faster)
> Mean 80 Single construction ( 6064772) vs Loop construction
(26381365) (4.349935.2x faster)
> Mean 80 Single construction ( 6064772) vs Loop construction with static FactorialLog
( 6341274) (1.045591.2x faster)
> {noformat}
> Thus the speed improvements would be approximately 34 fold for single use Poisson sampling.

This message was sent by Atlassian JIRA
(v7.6.3#76005)
