commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles Sadowski <gillese...@gmail.com>
Subject Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler
Date Fri, 10 May 2019 14:07:21 GMT
Hi.

Le ven. 10 mai 2019 à 15:53, Alex Herbert <alex.d.herbert@gmail.com> a écrit :
>
>
> On 10/05/2019 14:27, Gilles Sadowski wrote:
> > Hi Alex.
> >
> > Le ven. 10 mai 2019 à 13:57, Alex Herbert <alex.d.herbert@gmail.com> a écrit
:
> >> Can I get a review of the PR for RNG-101 please.
> > Thanks for this work!
> >
> > I didn't go into the details; however, I see many fields and methods like
> >    table1 ... table5
> >    fillTable1 ... fillTable5
> >    getTable1 ... getTable5
> > Wouldn't it be possible to use a 2D table:
> >    table[5][];
> > so that e.g. only one "fillTable(int tableIndex, /* other args */)" method
> > is necessary (where "tableIndex" runs from 0 to 4)?
>
> Yes. The design is based around using 5 tables as per the example code.
>
> The sample() method knows which table it needs so it can directly jump
> to the table in question. I'd have to look at the difference in speed
> when using a 2D table as you are adding another array access but
> reducing the number of possible method calls (although you still need a
> method call). Maybe this will be optimised out by the JVM.
>
> If the speed is not a factor then I'll rewrite it. Otherwise it's
> probably better done for speed as this is the entire point of the
> sampler given it disregards any probability under 2^-31 (i.e. it's not a
> perfectly fair sampler).
>
> Note that 5 tables are needed for 5 hex digits (base 2^6). The paper
> states using 3 tables of base 2^10 then you get a speed increase
> (roughly 1.16x) at the cost of storage (roughly 9x). Changing to 2
> tables of base 2^15 does not make it much faster again.
>
> I'll have a rethink to see if I can make the design work for different
> base sizes.

That could be an extension made easier with the 2D table, but
I quite agree that given the relatively minor speed improvement
to be expected, it is not the main reason; the rationale was just to
make the code a more compact and a little easier to grasp (IMHO).

Gilles

>
> >
> > The diff for "DiscreteSamplersList.java" refers to
> >     MarsagliaTsangWangBinomialSampler
> > but
> >    MarsagliaTsangWangSmallMeanPoissonSampler
> > seems to be missing.
>
> Oops, I missed adding that back. I built the PR from code where I was
> testing lots of implementations.
>
> I've just added it back and it is still passing locally. Travis should
> see that too as I pushed the change to the PR.
>
> >
> > Regards,
> > Gilles
> >
> >> This is a new sampler based on the source code from the paper:
> >>
> >> George Marsaglia, Wai Wan Tsang, Jingbo Wang (2004)
> >> Fast Generation of Discrete Random Variables.
> >> Journal of Statistical Software. Vol. 11, Issue. 3, pp. 1-11.
> >>
> >> https://www.jstatsoft.org/article/view/v011i03
> >>
> >> The code has no explicit licence.
> >>
> >> The paper states:
> >>
> >> "We have provided C versions of the two methods described here, for
> >> inclusion in the “Browse
> >> files”section of the journal. ... You may then want to examine the
> >> components of the two files, for illumination
> >> or for extracting portions that might be usefully applied to your
> >> discrete distributions."
> >>
> >> So I assuming that it can be incorporated with little modification.
> >>
> >> The Java implementation has been rewritten to allow the storage to be
> >> optimised for the required size. The generation of the tables has been
> >> adapted appropriately and checks have been added on the input parameters
> >> to ensure the sampler does not generate exceptions once constructed (I
> >> found out the hard way that the original code was not entirely correct).
> >>
> >> Thanks.
> >>
> >> Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message