spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: Use combineByKey and StatCount
Date Tue, 15 Apr 2014 01:35:38 GMT
Not very sure about the meaning of “mean of RDD by key”, is this what you
want?

val meansByKey = rdd
  .map { case (k, v) =>
    k -> (v, 1)
  }
  .reduceByKey { (lhs, rhs) =>
    (lhs._1 + rhs._1, lhs._2 + rhs._2)
  }
  .map { case (sum, count) =>
    sum / count
  }
  .collectAsMap()

With this, you need to be careful about overflow though.


On Tue, Apr 1, 2014 at 10:55 PM, Jaonary Rabarisoa <jaonary@gmail.com>wrote:

> Hi all;
>
> Can someone give me some tips to compute mean of RDD by key , maybe with
> combineByKey and StatCount.
>
> Cheers,
>
> Jaonary
>

Mime
View raw message