commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [GSoC][Commons][Statistics][Descriptive] Mean should be initiated with 0 or NaN ?
Date Fri, 19 Jul 2019 20:09:26 GMT


> On 19 Jul 2019, at 20:57, Eric Barnhill <ericbarnhill@gmail.com> wrote:
> 
> Hi Virenda,
> 
> I think that's right in terms of initialization. If it is initialized to
> NaN then accumulation will require an additional step getting rid of the
> NaN. Just initialize to zero.

+1

Initialisation with zero will allow the accumulating function to be free of checks.


> 
> I just looked around and it's pretty clear that it is best practice to
> return NaN in the edge case of an average of no values. That is what
> happens in Python when calling numpy.mean([]) and in R when calling
> mean(c()) , and that is also mathematically right.

+1

In-line with other libraries. It is also in-line with java which will throw an ArithmeticException
for 0 / 0 and return NaN for 0.0 / 0.0.

> 
> So, and I think this is a step that could be saved until after the
> milestone, a check for zero values and returning NaN in that case should
> probably be somehow implemented. But in terms of under the hood initialize
> to zero.

The code just needs to move the logic for checking if there are any values (count > 0)
into the getMean() method and return appropriately. This should be added to the contract of
Mean by putting into the Javadoc and adding a test to ensure it does work.


> 
> 
> 
> 
> On Thu, Jul 18, 2019 at 7:26 PM Virendra singh Rajpurohit <
> virendrasinghrp@gmail.com> wrote:
> 
>> Hi all,
>> Hope you all are doing well, I had a discussion  on Slack with my GSoC
>> mentors regarding this variable initiation. I'm posting it on ML for more
>> opinions.
>> 
>> *Should the variables like mean be initiated with NaN or 0?*
>> Because, definitional formula of mean is,
>>    mean = (sum of values)/n
>>    Hence for  n=0 it is 0/0 which is NaN
>> But also Java's SummaryStatistics classes(Double, Long & Int) return
>> average=0 for n=0.
>> As discussed on slack, "The initialization should not set the initial value
>> to NaN. This is a convenience to make getMean() faster. This is likely to
>> cause fewer problems than NaN when used in downstream computations".
>> Assigning '0' will make things faster because if condition to check n value
>> will be removed in calculation and assigning 'NaN' will be more correct.
>> *Alex Herbert* suggested NaN can be used in getMean() method with if
>> condition to check 'n' value, that way we don't check condition everytime a
>> value is added.
>> What are your opinions about it?
>> 
>> --
>> *Virendra Singh Rajpurohit*
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message