spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reza Zadeh <r...@databricks.com>
Subject Re: foreachActive functionality
Date Sun, 25 Jan 2015 20:25:52 GMT
The idea is to unify the code path for dense and sparse vector operations,
which makes the codebase easier to maintain. By handling (index, value)
tuples, you can let the foreachActive method take care of checking if the
vector is sparse or dense, and running a foreach over the values.

On Sun, Jan 25, 2015 at 8:18 AM, kundan kumar <iitr.kundan@gmail.com> wrote:

> Can someone help me to understand the usage of "foreachActive"  function
> introduced for the Vectors.
>
> I am trying to understand its usage in MultivariateOnlineSummarizer class
> for summary statistics.
>
>
> sample.foreachActive { (index, value) =>
>       if (value != 0.0) {
>         if (currMax(index) < value) {
>           currMax(index) = value
>         }
>         if (currMin(index) > value) {
>           currMin(index) = value
>         }
>
>         val prevMean = currMean(index)
>         val diff = value - prevMean
>         currMean(index) = prevMean + diff / (nnz(index) + 1.0)
>         currM2n(index) += (value - currMean(index)) * diff
>         currM2(index) += value * value
>         currL1(index) += math.abs(value)
>
>         nnz(index) += 1.0
>       }
>     }
>
> Regards,
> Kundan
>
>
>

Mime
View raw message