spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Night Wolf <>
Subject Partition Case Class RDD without ParRDDFunctions
Date Wed, 06 May 2015 09:14:26 GMT

If I have an RDD[MyClass] and I want to partition it by the hash code of
MyClass for performance reasons, is there any way to do this without
converting it into a PairRDD RDD[(K,V)] and calling partitionBy???

Mapping it to a tuple2 seems like a waste of space/computation.

It looks like the PairRDDFunctions..partitionBy() uses a ShuffleRDD[K,V,C]
requires K,V,C? Could I create a new
ShuffleRDD[MyClass,MyClass,MyClass](caseClassRdd, new HashParitioner)?


View raw message