commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <Luc.Maison...@free.fr>
Subject Re: [math] Multivariate (vector) stats WAS: Re: [math] sum of logs in summary statistics
Date Sun, 10 Feb 2008 16:11:58 GMT
Phil Steitz a écrit :
> On Feb 9, 2008 5:49 PM, Phil Steitz <phil.steitz@gmail.com> wrote:
>> On Feb 9, 2008 11:14 AM, Luc Maisonobe <Luc.Maisonobe@free.fr> wrote:
>>> Phil Steitz a écrit :
>>>
>>>> On Feb 8, 2008 7:00 AM,  <luc.maisonobe@free.fr> wrote:
>>>>> luc.maisonobe@free.fr wrote:
>>>>>
>>>>>> In addition to the statistics required by the StatisticalSummary
interface it
>>>>>> implements, the SummaryStatistics class computes the sum of squares
and the
>>>>>> sum
>>>>>> of logs. It also has setters and getters for the underlying statistics
>>>>>> implementations. However, it does not provide a getSumlg method.
>>>>> The sum of logs is also not used in the equals, hash and toString methods.
>>>>>
>>>>> Luc
>>>>>
>>>>>
>>>>>> Should the sum of logs computation be deprecated or a getSumlg method
added ?
>>>>>>
>>>> Interesting.  This is likely a result of refactoring several years
>>>> back when the geometric mean computation used the sum of logs
>>>> instance.  Now it does not, so it is either wasted computation or
>>>> something of value not exposed to the user.  Makes sense to me to add
>>>> getSumLog to SummaryStatistics.  It doesn't need to be included in
>>>> equals or hashcode since geo mean + N equivalence implies log sum
>>>> equivalence.
>>>>
>>>> Looking again at the code, I now see it as stupid that geometricMean
>>>> in SummaryStatistics does not use the sumOfLogs instance.  If
>>>> geometricMean exposed a setter for its internally wrapped sumOfLogs
>>>> instance, we could just set that in SummaryStatistics and only
>>>> increment the sumOfLogs instance.  It would probably also be an
>>>> improvment for geometricMean to expose a setter for this.
>>>>
>>>> If there are no objections, I will go ahead and make these changes.
>>>> Thanks for pointing this out, luc.
>>> Go ahead with that.
>>>
>>> If you could also have a glimpse at the multivariate summary statistics
>>> I added yesterday, I would be happy. During my paid work day (in the
>>> space industry), I am in the process of switching several projects from
>>> Mantissa to [math] and needed this feature.
>>>
>>> I am aware this was really done in a hurry. I have tried to be as
>>> compliant with univariate statistics as possible, but may have
>>> completely missed something. I have reused the VectorialCovariance class
>>> I commited one year ago, but it does not follow the general architecture
>>> of other statistics. I would really like to have your thoughts about this.
>>>
>> The code looks fine to me.  I have a couple of comments on the API.
>>
>> 1) I edited the javadoc for MultivariateSummaryStatistics to make it
>> clearer what was actually being computed / reported.  If you are OK
>> with these changes, we can replicate to the interface definition.

Yes, these changes are fine. I will copy the new javadoc to interface.

>> 2) It might be more convenient for users to have the implementation
>> setters just take a single instance, rather than an array.  I can't
>> think of a use case where one would want to use different
>> implementations of the same statistic here. (so, e.g.
>> setMeanImpl(StorelessUnivariateStatistic[] meanImpl) becomes just
>> setMeanImpl(StorelessUnivariateStatistic meanImpl) and we take care of
>> the cloning).
> 
> Ugh.  Just realized that there is no reason to believe we can clone arbitrary
> StorelessUnivariateStatistics, so this may not be possible.

I agree. So we should rather keep the arrays.

> 
>> 3) I am not sure StatisticalMultivariateSummaryValues is necessary /
>> valuable enough to be included.  The analog for univariate was added
>> to be input into tests, etc, but I am not sure this one is that
>> useful.  Do you need / use this or can you see use cases for it?  If
>> not, I would say omit it and we can always add later if someone wants
>> it.

I included it only for consistency. I don't need it and it does a lot of 
copying, which bother me. I'll remove it.

>>
>> Might be good also to add a reference in the user guide just to let
>> people know its there.

Yes. I forgot to do this. I'll write something.

Thanks
Luc

>>
>> Phil
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message