commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark R. Diggory" <>
Subject Re: [math] Re: "Straw man" release plan
Date Mon, 02 Feb 2004 20:16:19 GMT

Piotr KochaƄski wrote:
> Hello
> Phil Steitz wrote:
>>Thinking about how this will eventually work, it has occurred to me that 
>>EmpiricalDistribution could be used to digest / represent bootstrap 
>>distributions.  Since we want the interface for EmpiricalDistribution to 
>>be complete for 1.0, we need to make sure that bootstrap data can be 
>>loaded into EmpiricalDistribution conveniently (if this makes sense), so I 
>>have been thinking about adding load() methods to EmpiricalDistribution 
>>that take double[] arrays and streams as values, as well as an addValue() 
>>method.  Does this make sense?  I would also appreciate any comments / 
>>patches on how to improve the EmpiricalDistribution interface or 
>>EmpiricalDistributionImpl.  If refactoring or even holding this from the 
>>release are in order, I want to make sure that we do it.
> As I understand load(double[][]) would compute Empirical Distribution
> Function for every bootstraped sample (provided from some other source).
> Then, instead of having 
> SummaryStatistics sampleStats
> we should provide 
> SummaryStatistics[] sampleStats
> where this array would contain SummaryStatistics calculated
> for every sample.  SummaryStatistics getSampleStats() would
> be changed as well.

I think maybe this should be returning the more generic 
StatisticalSummary interface. If you are returning precalculated 
results, you do not exactly want to expose the underlying implementation 
to modification by the user.

StatisticalSummary[] sampleStats ...

> Similarly other methods/objects in EmpiricalDistribution  
> would have to be modified (e.g. binStats would have to be 
> an array of ArrayLists, etc.).
> Do I get your intentions right?
> The zeroth row of every matrix could be reserved for original
> sample and the rest for bootstrapped results (if they can be
> calculated, i.e. samples are given). This can be achieved but
> some effort has to be made to make it simple to use for those,
> who does not care about bootstrap and want to get results
> based only on the original sample. 
> The other thing is that such an extension would be very
> usefull as long as we play with such bootstrap algorithms,
> which use those statistics which are memebers of SummaryStatistics.
> Often this is not the case (classic example is Median or Trimmed Mean,
> which is not among SummaryStatistics). Sometimes it is also
> necessary (or more comfortable) to operate on the raw bootstrap
> samples, not EDF calculated from those samples. In this two
> cases bootstrap embeded into EmpiricalDistribution would not
> be that useful.

If your going to be preserving the original/bootstrap values in a 
double[][], then the Standard "DescriptiveStatisticsImpl" could be used.

public interface FullStatisticalSummary {
	public abstract double getMean();
	public abstract double getVariance();
	public abstract double getStandardDeviation();
	public abstract double getMax();
	public abstract double getMin();
	public abstract long getN();
	public abstract double getSum();
	public abstract double getPercentile(double p);

or more simply,

public interface FullStatisticalSummary extends StatisticalSummary{
	public abstract double getPercentile(double p);

Which would then Be implemented by DescriptiveStatistcs.

If returning an Interface that exposes the statistical analysis of said 
values, then an expanded interface that includes other available 
statistics could easily be added to the API.

> Two comments concerning EmpiricalDistribution 
> 1. Probably it would be nice to have load(double[]) method
> 2. Instead of
>    ArrayList getBinStats();
> there could be 
>    List getBinStats();
> although I can't imagine practical situation, where other List then
> ArrayList would be better.
> Piotr
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Mark Diggory
Software Developer
Harvard MIT Data Center

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message