commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler
Date Fri, 10 May 2019 15:10:27 GMT

On 10/05/2019 15:07, Gilles Sadowski wrote:
> Hi.
>
> Le ven. 10 mai 2019 à 15:53, Alex Herbert <alex.d.herbert@gmail.com> a écrit
:
>>
>> On 10/05/2019 14:27, Gilles Sadowski wrote:
>>> Hi Alex.
>>>
>>> Le ven. 10 mai 2019 à 13:57, Alex Herbert <alex.d.herbert@gmail.com> a
écrit :
>>>> Can I get a review of the PR for RNG-101 please.
>>> Thanks for this work!
>>>
>>> I didn't go into the details; however, I see many fields and methods like
>>>     table1 ... table5
>>>     fillTable1 ... fillTable5
>>>     getTable1 ... getTable5
>>> Wouldn't it be possible to use a 2D table:
>>>     table[5][];
>>> so that e.g. only one "fillTable(int tableIndex, /* other args */)" method
>>> is necessary (where "tableIndex" runs from 0 to 4)?
>> Yes. The design is based around using 5 tables as per the example code.
>>
>> The sample() method knows which table it needs so it can directly jump
>> to the table in question. I'd have to look at the difference in speed
>> when using a 2D table as you are adding another array access but
>> reducing the number of possible method calls (although you still need a
>> method call). Maybe this will be optimised out by the JVM.
>>
>> If the speed is not a factor then I'll rewrite it. Otherwise it's
>> probably better done for speed as this is the entire point of the
>> sampler given it disregards any probability under 2^-31 (i.e. it's not a
>> perfectly fair sampler).
>>
>> Note that 5 tables are needed for 5 hex digits (base 2^6). The paper
>> states using 3 tables of base 2^10 then you get a speed increase
>> (roughly 1.16x) at the cost of storage (roughly 9x). Changing to 2
>> tables of base 2^15 does not make it much faster again.
>>
>> I'll have a rethink to see if I can make the design work for different
>> base sizes.
> That could be an extension made easier with the 2D table, but
> I quite agree that given the relatively minor speed improvement
> to be expected, it is not the main reason; the rationale was just to
> make the code a more compact and a little easier to grasp (IMHO).
>
> Gilles

Benchmark                             (randomSourceName)                              
(samplerType)  Mode  Cnt  Score   Error  Units
DiscreteSamplersPerformance.baseline                 N/A                                        
N/A  avgt    5  2.043 ± 0.015  ns/op
DiscreteSamplersPerformance.nextInt         SPLIT_MIX_64                                        
N/A  avgt    5  3.577 ± 0.028  ns/op
DiscreteSamplersPerformance.sample          SPLIT_MIX_64           MarsagliaTsangWangDiscreteSampler 
avgt    5  5.550 ± 0.019  ns/op
DiscreteSamplersPerformance.sample          SPLIT_MIX_64          MarsagliaTsangWangDiscreteSampler2 
avgt    5  5.974 ± 0.073  ns/op
DiscreteSamplersPerformance.sample          SPLIT_MIX_64   MarsagliaTsangWangSmallMeanPoissonSampler 
avgt    5  8.104 ± 0.048  ns/op
DiscreteSamplersPerformance.sample          SPLIT_MIX_64  MarsagliaTsangWangSmallMeanPoissonSampler2 
avgt    5  8.217 ± 0.015  ns/op
DiscreteSamplersPerformance.sample          SPLIT_MIX_64           MarsagliaTsangWangBinomialSampler 
avgt    5  8.321 ± 0.028  ns/op
DiscreteSamplersPerformance.sample          SPLIT_MIX_64          MarsagliaTsangWangBinomialSampler2 
avgt    5  9.277 ± 0.167  ns/op

The Poisson(mean = 22.9) sampler has a small difference:

8.104 - 3.577 = 4.527
8.217 - 3.577 = 4.64

About 2.4% slower.

But the Binomial (n=67,p=0.33) sampler is a fair bit slower:

8.321 - 3.577 = 4.744
9.277 - 3.577 = 5.7

About 20% slower.

So it seems that the JVM cannot optimise the 2D table look-up. It may 
well be due to the use of an interface to support different table 
storage types:

table.get(0, n)

If working direct with the array then:

array[0][n]

may be optimised away to just the second index access.

As it is the following:

table.get0(n)
table.get1(n)
...

is faster than

table.get(0, n)
table.get(1, n)
...

I have to admit that the code with the 2D table is nice and compact. But 
thinking about the implementation it can still support 2, 3, or 5 tables 
with the current approach. The later tables would just be empty.

Here are some results for a quick hack of a 3 table version (the suffix 
10 is for base 10):

Benchmark                             (randomSourceName)                        
(samplerType)  Mode  Cnt  Score   Error  Units
DiscreteSamplersPerformance.baseline                 N/A                                  
N/A  avgt    5  2.042 ± 0.008  ns/op
DiscreteSamplersPerformance.nextInt         SPLIT_MIX_64                                  
N/A  avgt    5  3.577 ± 0.025  ns/op
DiscreteSamplersPerformance.sample          SPLIT_MIX_64     MarsagliaTsangWangDiscreteSampler 
avgt    5  6.087 ± 2.804  ns/op
DiscreteSamplersPerformance.sample          SPLIT_MIX_64    MarsagliaTsangWangDiscreteSampler2 
avgt    5  6.002 ± 0.141  ns/op
DiscreteSamplersPerformance.sample          SPLIT_MIX_64   MarsagliaTsangWangDiscreteSampler10 
avgt    5  5.301 ± 0.015  ns/op
DiscreteSamplersPerformance.sample          SPLIT_MIX_64  MarsagliaTsangWangDiscreteSampler102 
avgt    5  5.715 ± 0.007  ns/op

There's a timing anomaly for the original 
MarsagliaTsangWangDiscreteSampler. The median is 5.76. Subtracting the 
baseline and computing the relative performance makes the original 
version 26% slower when using 5 tables rather than 3, the 2D version is 
13% slower as the cost of the 2D look-up is present in both and the 
reduced number of tables has less of an effect.

I will create a version that can use either 3 or 5 tables and avoid the 
2D table. Then investigate the actual table storage requirements. It may 
be better to use 5 tables by default and allow an override in the 
constructor for 3 tables.

Alex


>
>>> The diff for "DiscreteSamplersList.java" refers to
>>>      MarsagliaTsangWangBinomialSampler
>>> but
>>>     MarsagliaTsangWangSmallMeanPoissonSampler
>>> seems to be missing.
>> Oops, I missed adding that back. I built the PR from code where I was
>> testing lots of implementations.
>>
>> I've just added it back and it is still passing locally. Travis should
>> see that too as I pushed the change to the PR.
>>
>>> Regards,
>>> Gilles
>>>
>>>> This is a new sampler based on the source code from the paper:
>>>>
>>>> George Marsaglia, Wai Wan Tsang, Jingbo Wang (2004)
>>>> Fast Generation of Discrete Random Variables.
>>>> Journal of Statistical Software. Vol. 11, Issue. 3, pp. 1-11.
>>>>
>>>> https://www.jstatsoft.org/article/view/v011i03
>>>>
>>>> The code has no explicit licence.
>>>>
>>>> The paper states:
>>>>
>>>> "We have provided C versions of the two methods described here, for
>>>> inclusion in the “Browse
>>>> files”section of the journal. ... You may then want to examine the
>>>> components of the two files, for illumination
>>>> or for extracting portions that might be usefully applied to your
>>>> discrete distributions."
>>>>
>>>> So I assuming that it can be incorporated with little modification.
>>>>
>>>> The Java implementation has been rewritten to allow the storage to be
>>>> optimised for the required size. The generation of the tables has been
>>>> adapted appropriately and checks have been added on the input parameters
>>>> to ensure the sampler does not generate exceptions once constructed (I
>>>> found out the hard way that the original code was not entirely correct).
>>>>
>>>> Thanks.
>>>>
>>>> Alex
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message