commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <>
Subject Re: [math] Multivariate (vector) stats WAS: Re: [math] sum of logs in summary statistics
Date Sun, 10 Feb 2008 16:11:58 GMT
Phil Steitz a écrit :
> On Feb 9, 2008 5:49 PM, Phil Steitz <> wrote:
>> On Feb 9, 2008 11:14 AM, Luc Maisonobe <> wrote:
>>> Phil Steitz a écrit :
>>>> On Feb 8, 2008 7:00 AM,  <> wrote:
>>>>> wrote:
>>>>>> In addition to the statistics required by the StatisticalSummary
interface it
>>>>>> implements, the SummaryStatistics class computes the sum of squares
and the
>>>>>> sum
>>>>>> of logs. It also has setters and getters for the underlying statistics
>>>>>> implementations. However, it does not provide a getSumlg method.
>>>>> The sum of logs is also not used in the equals, hash and toString methods.
>>>>> Luc
>>>>>> Should the sum of logs computation be deprecated or a getSumlg method
added ?
>>>> Interesting.  This is likely a result of refactoring several years
>>>> back when the geometric mean computation used the sum of logs
>>>> instance.  Now it does not, so it is either wasted computation or
>>>> something of value not exposed to the user.  Makes sense to me to add
>>>> getSumLog to SummaryStatistics.  It doesn't need to be included in
>>>> equals or hashcode since geo mean + N equivalence implies log sum
>>>> equivalence.
>>>> Looking again at the code, I now see it as stupid that geometricMean
>>>> in SummaryStatistics does not use the sumOfLogs instance.  If
>>>> geometricMean exposed a setter for its internally wrapped sumOfLogs
>>>> instance, we could just set that in SummaryStatistics and only
>>>> increment the sumOfLogs instance.  It would probably also be an
>>>> improvment for geometricMean to expose a setter for this.
>>>> If there are no objections, I will go ahead and make these changes.
>>>> Thanks for pointing this out, luc.
>>> Go ahead with that.
>>> If you could also have a glimpse at the multivariate summary statistics
>>> I added yesterday, I would be happy. During my paid work day (in the
>>> space industry), I am in the process of switching several projects from
>>> Mantissa to [math] and needed this feature.
>>> I am aware this was really done in a hurry. I have tried to be as
>>> compliant with univariate statistics as possible, but may have
>>> completely missed something. I have reused the VectorialCovariance class
>>> I commited one year ago, but it does not follow the general architecture
>>> of other statistics. I would really like to have your thoughts about this.
>> The code looks fine to me.  I have a couple of comments on the API.
>> 1) I edited the javadoc for MultivariateSummaryStatistics to make it
>> clearer what was actually being computed / reported.  If you are OK
>> with these changes, we can replicate to the interface definition.

Yes, these changes are fine. I will copy the new javadoc to interface.

>> 2) It might be more convenient for users to have the implementation
>> setters just take a single instance, rather than an array.  I can't
>> think of a use case where one would want to use different
>> implementations of the same statistic here. (so, e.g.
>> setMeanImpl(StorelessUnivariateStatistic[] meanImpl) becomes just
>> setMeanImpl(StorelessUnivariateStatistic meanImpl) and we take care of
>> the cloning).
> Ugh.  Just realized that there is no reason to believe we can clone arbitrary
> StorelessUnivariateStatistics, so this may not be possible.

I agree. So we should rather keep the arrays.

>> 3) I am not sure StatisticalMultivariateSummaryValues is necessary /
>> valuable enough to be included.  The analog for univariate was added
>> to be input into tests, etc, but I am not sure this one is that
>> useful.  Do you need / use this or can you see use cases for it?  If
>> not, I would say omit it and we can always add later if someone wants
>> it.

I included it only for consistency. I don't need it and it does a lot of 
copying, which bother me. I'll remove it.

>> Might be good also to add a reference in the user guide just to let
>> people know its there.

Yes. I forgot to do this. I'll write something.


>> Phil
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message