spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Siegmann <daniel.siegm...@velos.io>
Subject Re: Generic types and pair RDDs
Date Tue, 01 Apr 2014 21:29:26 GMT
That worked, thank you both! Thanks also Aaron for the list of things I
need to read up on - I hadn't heard of ClassTag before.


On Tue, Apr 1, 2014 at 5:10 PM, Aaron Davidson <ilikerps@gmail.com> wrote:

> Koert's answer is very likely correct. This implicit definition which
> converts an RDD[(K, V)] to provide PairRDDFunctions requires a ClassTag is
> available for K:
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1124
>
> To fully understand what's going on from a Scala beginner's point of view,
> you'll have to look up ClassTags, context bounds (the "K : ClassTag"
> syntax), and implicit functions. Fortunately, you don't have to understand
> monads...
>
>
> On Tue, Apr 1, 2014 at 2:06 PM, Koert Kuipers <koert@tresata.com> wrote:
>
>>   import org.apache.spark.SparkContext._
>>   import org.apache.spark.rdd.RDD
>>   import scala.reflect.ClassTag
>>
>>   def joinTest[K: ClassTag](rddA: RDD[(K, Int)], rddB: RDD[(K, Int)]) :
>> RDD[(K, Int)] = {
>>
>>     rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) }
>>   }
>>
>>
>> On Tue, Apr 1, 2014 at 4:55 PM, Daniel Siegmann <daniel.siegmann@velos.io
>> > wrote:
>>
>>> When my tuple type includes a generic type parameter, the pair RDD
>>> functions aren't available. Take for example the following (a join on two
>>> RDDs, taking the sum of the values):
>>>
>>> def joinTest(rddA: RDD[(String, Int)], rddB: RDD[(String, Int)]) :
>>> RDD[(String, Int)] = {
>>>     rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) }
>>> }
>>>
>>> That works fine, but lets say I replace the type of the key with a
>>> generic type:
>>>
>>> def joinTest[K](rddA: RDD[(K, Int)], rddB: RDD[(K, Int)]) : RDD[(K,
>>> Int)] = {
>>>     rddA.join(rddB).map { case (k, (a, b)) => (k, a+b) }
>>> }
>>>
>>> This latter function gets the compiler error "value join is not a member
>>> of org.apache.spark.rdd.RDD[(K, Int)]".
>>>
>>> The reason is probably obvious, but I don't have much Scala experience.
>>> Can anyone explain what I'm doing wrong?
>>>
>>> --
>>> Daniel Siegmann, Software Developer
>>> Velos
>>> Accelerating Machine Learning
>>>
>>> 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
>>> E: daniel.siegmann@velos.io W: www.velos.io
>>>
>>
>>
>


-- 
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegmann@velos.io W: www.velos.io

Mime
View raw message