commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: [math] Generate random data using the Inverse CDF Method?
Date Mon, 02 Nov 2009 18:53:10 GMT
We should probably say which parts of the problem are important to us.  It
begins to sound like we each care about slightly different aspects of the

The only points that I really care about are:

- the user should have available some obvious way to sample from a
distribution as a method on the distribution itself.  This need is not met
by having a completely separate class in a different package that the user
must somehow intuit the existence of.

- the user should have the widest possible number of distributions that have
*some* kind of sampling procedure that produces accurate samples.  Morevoer,
this wide availability should happen very soon.

Note that neither of these points really implies much about implementation
other than where the user of commons-math can find an access to
implementations and that we implement something across many distributions
very soon.

These are points that I explicitly don't care about:

- should the implementation be based on inverse cumulative distributions if
available?  If there is another way to get lots of sampling algorithms
implemented, I am all for it.  Marsaglia's table method for discrete
distributions is an interesting option for some cases.  There may be other
algorithms that could have wide applicability.  Multiple approaches might be
a good idea, special purpose samplers for some cases (like normal or
exponential distributions), kind of general methods like Marsaglia's method
where it can be done.  If all of the common cases have special purpose, high
quality generators, I don't see a problem with letting all of the other
distributions that we haven't considered yet fall back to inverse cumulative
methods.  But all of these considerations are not what I really care about.
I only care about very wide availability of *some* sampling method.

- should there be random number generators that provide more
generality/flexibility/alternative implementations for sampling for various
distributions.  This is an implementation question that can be answered many
ways.  I think that lots of alternatives are good.  I even think that having
pure implementations of one method or another might be an excellent way to
allow us to stitch together the sampling available by default from the
distribution.  All of these consideration, however, are not what I really
care about.  What I care about is that all of these implementations should
be ignorable by a less than devoted user of commons math.

Now, it seems to me that the points that Phil cares most about fall mostly
into the set of things that I care less about.  Moreover, some of the
opinions that Phil has expressed have been stated in ways that I may have
misinterpreted.  For instance, it sounded to me like Phil was saying that we
shouldn't even implement the inverse cumulative sampler.  On reflection, I
think that his real point is that we should not use the inverse cumulative
method where there are better methods, especially if we already have
implementations of the better methods.

Likewise, it sounded to me like Phil was saying that we absolutely shouldn't
allow easy access to a community consensus sampling algorithm from the
distribution.  On further reflection, I think that his real point is that we
simply should not be doing most implementation in the distribution function
class, but should have a separate package to separate all that work away
from the view of the users.  That sounds like a really good idea, if only to
decrease the noise for the casual user of the distribution classes.

This sounds like the germ of compromise.

On Mon, Nov 2, 2009 at 3:03 AM, Phil Steitz <> wrote:

>  I just don't like your suggested implementation and package
> placement.  I proposed an alternative (a generic method added
> somewhere in the random package), which you did not like. There are
> no doubt other better ways to do this.  Perhaps others have ideas?

Ted Dunning, CTO

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message