commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz" <>
Subject RE: [math] Questions regarding probability distributions
Date Tue, 19 Oct 2004 15:50:19 GMT
After reading carefully again and thinking about some practical examples, I agree that the
current framework has a fundamental and unecessary limitation.  The "point mass at 0, continuous
beyond 0" example below does occur in practical applications (e.g. component lifetimes, 0
= defective).  As I said in a previous post, the distributions package was designed to house
commonly used "parametric" distributions like the ones that are implemented now; but there
is no reason that the framework could not be used to support any kind of distribution.  Therefore,
since the change to add a base interface is small and does not really complicate the structure
or client code, I am +0 for adding it.  Any other opinions on this?
More specific comments below.

>> > Well, the problem is this: If I need to create some custom discrete
>> > distribution that doesn't take on integer values, what interface should I
>> > implement? With your model I have no choice but use the
>> > ContinuousDistribution interface even though the distribution *isn't*
>> > continuous. Does that make sense?
>> Can you provide a practical example of this?  IIUC, what you are really
>> arguing for is changing the int's in the DiscreteDistribution interface to
>> doubles. This has the advantage of greater generality but makes it
>> slightly less convenient for implementors of the most common discrete
>> distributions, where the values are integers.

>Well, changing the int's in the DiscreteDistribution interface to doubles is
>kind of a workaround, but I don't think it will settle the issue for good, 
>see below.

>As for examples, you can take *any* mixed distribution as an example of what I
>mean. Consider a random variable X with domain D that can be partitioned
>into subsets A and B such that

>1. A is a countable set and 0 < P(X is in A) < 1
>2. P(X = x) = 0 for all x in B
> How would the distribution for such a random variable be represented in
>your framework?
Not possible. 

>As a simple example of this, consider a random variable with the density

>f(x) = 0.5 for x=0
>f(x) = 0.5 for 1<x<2

>How does this distribution fit into your framework? Sure, you could have
>it implement the ContinuousDistribution interface but it *isn't* a
>continuous distribution (in the sense that it doesn't conform to the
>definition of a continuous distribution in probability theory) - and
>then it shouldn't implement an interface called ContinuousDistribution.

>Recall: A random variable is continuous if its distribution function P(X <= x)
>can be expressed as the Riemann-integral of some integrable function
>f: R -> [0, infinity)

>The basic problem is that you have an implicit assumption in your
>framework that each and every probability distribution can be classified
>as being either discrete or continuous . That is simply not true.
>Discrete and continuous distributions are really only special cases of
>a broader concept. Aside from that you also have the problem of how to
>handle the case of a discrete distribution that doesn't take on integer

>Note: There are also distributions that are neither discrete, continuous or a
>mixture of the two. For example, there are numerous distributions based upon 
>the Cantor ternary sets.
Practical counter-examples like what you have above are more compelling ;-)

>The bottom line is that you *cannot* do without a generic
>ProbabilityDistribution interface.
>This interface should expose a method that exists for all and completely
>determines a particular probability distribution, such as the
>distribution function P(X <= x).

>As an easy solution, you could define it as
>public interface ProbabilityDistribution {
>        public double distributionFunction(double x);

>and have ContinuousDistribution and DiscreteDistribution extend it.

>This should work ok (though the name DiscreteDistribution is misleading)
If we extend the base interface in DiscreteDistribution, this will make that fully generic,
no?  Why is the name misleading?  I am thinking that this interface would include both int
*and* double argument versions, with the int versions for convenience and ease of use for
the most common case in which the distribution corresponds to an integer-valued random variable.

>but if you want a completely generic and typesafe definition you should
>go for something like

>public interface ProbabilityDistribution {
>        public Probability distributionFunction(Number x);
I think we can make it work with doubles and don't see a big loss there.  I guess this is
where I get off the bus ;-) -- though I see your point.

View raw message