spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Han JU <ju.han.fe...@gmail.com>
Subject No access to pairRDDFunctions
Date Thu, 26 Sep 2013 13:25:12 GMT
Hi,

I have some classes like

abstract class RawData[+K, +V](id: K, data: V) extends Tuple2[K, V](uid,
data)

case class SomeData(id: Int, data: Data) extends RawData[Int, Data](id,
data)


to model some input data.

Then I find out that RDD[SomeData] doesn't have access to pairRDDFunctions,
like join. But SomeData is indeed a subclass of Tuple2.

I guess that the problem comes from the invariance of T in RDD[T], and
RDD[SomeData] is not a subclass of RDD[Tuple2] so the implicit conversion
won't work.

So,

1) how could I work this around? How do you model data of lots of fields
that need to be joined? I don't really want to have things like "_._2._2"
but rather "_.id" or "_.data.someFields".

2) is there some reason for invariance of T in RDD? could it be covariant?


Thanks!

-- 
*JU Han*

Data Engineer @ Botify.com

+33 0619608888

Mime
View raw message