spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Awatramani <>
Subject Re: N-Fold validation and RDD partitions
Date Fri, 21 Mar 2014 13:46:05 GMT
Hi Jaonary,

I believe the n folds should be mapped into n Keys in spark using a map function. You can
reduce the returned PairRDD and you should get your metric.
I don't understand partitions fully, but from whatever I understand of it, they aren't required
in your scenario.


On Friday, 21 March 2014 7:03 PM, Jaonary Rabarisoa <> wrote:

I need to partition my data represented as RDD into n folds and run metrics computation in
each fold and finally compute the means of my metrics overall the folds.
Does spark can do the data partition out of the box or do I need to implement it myself. I
know that RDD has a partitions method and mapPartitions but I really don't understand the
purpose and the meaning of partition here.


View raw message