commons-dev mailing list archives

Site index · List index
Message view
Top
From Eric Barnhill <ericbarnh...@gmail.com>
Subject Re: [GSoC][Commons][Statistics][Descriptive] Mean should be initiated with 0 or NaN ?
Date Fri, 19 Jul 2019 19:57:26 GMT
```Hi Virenda,

I think that's right in terms of initialization. If it is initialized to
NaN then accumulation will require an additional step getting rid of the
NaN. Just initialize to zero.

I just looked around and it's pretty clear that it is best practice to
return NaN in the edge case of an average of no values. That is what
happens in Python when calling numpy.mean([]) and in R when calling
mean(c()) , and that is also mathematically right.

So, and I think this is a step that could be saved until after the
milestone, a check for zero values and returning NaN in that case should
probably be somehow implemented. But in terms of under the hood initialize
to zero.

On Thu, Jul 18, 2019 at 7:26 PM Virendra singh Rajpurohit <
virendrasinghrp@gmail.com> wrote:

> Hi all,
> Hope you all are doing well, I had a discussion  on Slack with my GSoC
> mentors regarding this variable initiation. I'm posting it on ML for more
> opinions.
>
> *Should the variables like mean be initiated with NaN or 0?*
> Because, definitional formula of mean is,
>     mean = (sum of values)/n
>     Hence for  n=0 it is 0/0 which is NaN
> But also Java's SummaryStatistics classes(Double, Long & Int) return
> average=0 for n=0.
> As discussed on slack, "The initialization should not set the initial value
> to NaN. This is a convenience to make getMean() faster. This is likely to
> cause fewer problems than NaN when used in downstream computations".
> Assigning '0' will make things faster because if condition to check n value
> will be removed in calculation and assigning 'NaN' will be more correct.
> *Alex Herbert* suggested NaN can be used in getMean() method with if
> condition to check 'n' value, that way we don't check condition everytime a