spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: Generic types and pair RDDs
Date Tue, 01 Apr 2014 21:06:43 GMT
  import org.apache.spark.SparkContext._
  import org.apache.spark.rdd.RDD
  import scala.reflect.ClassTag

  def joinTest[K: ClassTag](rddA: RDD[(K, Int)], rddB: RDD[(K, Int)]) :
RDD[(K, Int)] = {
    rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) }
  }


On Tue, Apr 1, 2014 at 4:55 PM, Daniel Siegmann <daniel.siegmann@velos.io>wrote:

> When my tuple type includes a generic type parameter, the pair RDD
> functions aren't available. Take for example the following (a join on two
> RDDs, taking the sum of the values):
>
> def joinTest(rddA: RDD[(String, Int)], rddB: RDD[(String, Int)]) :
> RDD[(String, Int)] = {
>     rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) }
> }
>
> That works fine, but lets say I replace the type of the key with a generic
> type:
>
> def joinTest[K](rddA: RDD[(K, Int)], rddB: RDD[(K, Int)]) : RDD[(K, Int)]
> = {
>     rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) }
> }
>
> This latter function gets the compiler error "value join is not a member
> of org.apache.spark.rdd.RDD[(K, Int)]".
>
> The reason is probably obvious, but I don't have much Scala experience.
> Can anyone explain what I'm doing wrong?
>
> --
> Daniel Siegmann, Software Developer
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
> E: daniel.siegmann@velos.io W: www.velos.io
>

Mime
View raw message