commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [Commons][Descriptive][STATISTICS-7][GSoC] SummaryStatistics class design & Whether to use DoubleSummaryStatistics class from java.util package?
Date Sun, 02 Jun 2019 15:09:16 GMT


> On 2 Jun 2019, at 13:45, Virendra singh Rajpurohit <virendrasinghrp@gmail.com>
wrote:
> 
> I've been trying to make summary statistics class. I have some doubt. There is a class
DoubleSummaryStatistics in java.util package(There are two more for Int and Long). I'll attach
this file here. 
> Do I have to design SummaryStatistics in this way only? I mean, description on DoubleSummaryStatistics
is "This class is designed to work with (though does not require) streams <https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html>.
For example, you can compute summary statistics on a stream of doubles with:
>  
>  DoubleSummaryStatistics stats = doubleStream.collect(DoubleSummaryStatistics::new,
>                                                       DoubleSummaryStatistics::accept,

>                                                       DoubleSummaryStatistics::combine);"
> Earlier my understanding of the project was that the user just have to call the function
"getSummary()" & all the calculations will be done automatically in streams.

If you put all the work with streams inside the getSummary() function then the user cannot
decide how to build the stream (e.g. serial or parallel). So designing like the JDK class
to work with streams would be better.

> but As we can see in DoubleSummaryStatistics we have to call collect() method.  
> There are some functions like max, min, sum, count, average which are already defined
in this class. So should I extend this class in my class or not? Also, I'll have to add more
statistics other than max,min,sum for that I have to override accept() function which will
be used for  streams.

You could extend this JDK class to add functionality. In the accept and combine method just
call super.accept and super.combine. Then do the extra work you require.

One useful stat that is missing from the class is variance. A first addition would be to extend
DoubleSummaryStatistics and add a variance (plus standard deviation) function with a variant
for the population variance (or population standard deviation).

Note that a method to add a second moment to another second moment is required. This is not
present in math4 AFAIK. There is this parallel variance algorithm [1] that would allow you
to implement the combine() method to join two instances of your summary statistics class.

Alex


[1] https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm <https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm>


> 
> Warm Regards,
> -- 
> Virendra Singh Rajpurohit
> 
> University of Petroleum and Energy Studies,Dehradun
> Linkedin:https://www.linkedin.com/in/virendra-singh-rajpurohit <https://www.linkedin.com/in/virendra-singh-rajpurohit>
> 
> 
> 
> 
> 
>   <https://mailtrack.io/?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender notified by 
> Mailtrack <https://mailtrack.io/?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
06/02/19, 6:14:27 PM	
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message