spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Natalia Connolly <natalia.v.conno...@gmail.com>
Subject SVM questions (data splitting, SVM parameters)
Date Wed, 11 Mar 2015 15:18:14 GMT
Hello,

   I am new to Spark and I am evaluating its suitability for my machine
learning tasks.  I am using Spark v. 1.2.1.  I would really appreciate if
someone could provide any insight about the following two issues.

 1.  I'd like to try a "leave one out" approach for training my SVM,
meaning that all but one data points are used for training.  The example
SVM classifier code on the Spark webpage has this:

JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc, path).toJavaRDD();

JavaRDD<LabeledPoint> training = data.sample(false, 0.6, 11L);
training.cache();
JavaRDD<LabeledPoint> test = data.subtract(training);

  Is there a way to iterate over data and progressively remove each element
in order to designate the rest of the dataset as training, instead of using
a certain fraction of all the data for training (60% in the above example)?


2.  Is there a way to choose and vary the parameters of the SVM?  (kernel,
cost, gamma…)

    Thank you!

    Natalia

Mime
View raw message