spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Rosen <rosenvi...@gmail.com>
Subject Re: No access to pairRDDFunctions
Date Thu, 26 Sep 2013 23:24:14 GMT
There's an old JIRA issue proposing to make RDD covariant in T:

https://spark-project.atlassian.net/browse/SPARK-697

I think that I tried making RDD covariant in T at some point, but ran into
compiler errors.


On Thu, Sep 26, 2013 at 2:57 PM, Reynold Xin <rxin@cs.berkeley.edu> wrote:

> You can do a cast
>
> val rdd = some RDD[SomeData]
>
> rdd.asInstanceOf[RDD[Tuple2[Int, Data]]].reduceByKey(...)
>
>
>
> It's invariant because of historic reasons I think. It is fairly hard to
> change it now.
>
>
>
> --
> Reynold Xin, AMPLab, UC Berkeley
> http://rxin.org
>
>
>
> On Thu, Sep 26, 2013 at 6:25 AM, Han JU <ju.han.felix@gmail.com> wrote:
>
>> Hi,
>>
>> I have some classes like
>>
>> abstract class RawData[+K, +V](id: K, data: V) extends Tuple2[K, V](uid,
>> data)
>>
>> case class SomeData(id: Int, data: Data) extends RawData[Int, Data](id,
>> data)
>>
>>
>> to model some input data.
>>
>> Then I find out that RDD[SomeData] doesn't have access to
>> pairRDDFunctions, like join. But SomeData is indeed a subclass of Tuple2.
>>
>> I guess that the problem comes from the invariance of T in RDD[T], and
>> RDD[SomeData] is not a subclass of RDD[Tuple2] so the implicit conversion
>> won't work.
>>
>> So,
>>
>> 1) how could I work this around? How do you model data of lots of fields
>> that need to be joined? I don't really want to have things like "_._2._2"
>> but rather "_.id" or "_.data.someFields".
>>
>> 2) is there some reason for invariance of T in RDD? could it be covariant?
>>
>>
>> Thanks!
>>
>> --
>> *JU Han*
>>
>> Data Engineer @ Botify.com
>>
>> +33 0619608888
>>
>
>

Mime
View raw message