spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hai-Anh Trinh <...@adatao.com>
Subject Re: N-Fold validation and RDD partitions
Date Fri, 21 Mar 2014 15:58:03 GMT
Hi Jaonary,

You can find the code for k-fold CV in
https://github.com/apache/incubator-spark/pull/448. I have not find the
time to resubmit the pull to latest master.


On Fri, Mar 21, 2014 at 8:46 PM, Sanjay Awatramani <sanjay_awat@yahoo.com>wrote:

> Hi Jaonary,
>
> I believe the n folds should be mapped into n Keys in spark using a map
> function. You can reduce the returned PairRDD and you should get your
> metric.
> I don't understand partitions fully, but from whatever I understand of it,
> they aren't required in your scenario.
>
> Regards,
> Sanjay
>
>
>   On Friday, 21 March 2014 7:03 PM, Jaonary Rabarisoa <jaonary@gmail.com>
> wrote:
>   Hi
>
> I need to partition my data represented as RDD into n folds and run
> metrics computation in each fold and finally compute the means of my
> metrics overall the folds.
> Does spark can do the data partition out of the box or do I need to
> implement it myself. I know that RDD has a partitions method and
> mapPartitions but I really don't understand the purpose and the meaning of
> partition here.
>
>
>
> Cheers,
>
> Jaonary
>
>
>


-- 
Hai-Anh Trinh | Senior Software Engineer | http://adatao.com/
http://www.linkedin.com/in/haianh

Mime
View raw message