spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Turner (TMS)" <alex_tur...@toyota.com>
Subject RDD pair to pair of RDDs
Date Wed, 18 Mar 2015 18:48:57 GMT
What's the best way to go from:

RDD[(A, B)] to (RDD[A], RDD[B])

If I do:

def separate[A, B](k: RDD[(A, B)]) = (k.map(_._1), k.map(_._2))

Which is the obvious solution, this runs two maps in the cluster.  Can I do some kind of a
fold instead:

def separate[A, B](l: List[(A, B)]) = l.foldLeft(List[A](), List[B]())((a, b) => (b._1
:: a._1, b._2 :: a._2))

But obviously this has an aggregate component that I don't want to be running on the driver
right?


Thanks,

Alex

Mime
View raw message