spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ge, Yao (Y.)" <...@ford.com>
Subject Decision Tree with libsvmtools datasets
Date Thu, 11 Dec 2014 03:40:55 GMT
I am testing decision tree using iris.scale data set (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#iris)
In the data set there are three class labels 1, 2, and 3. However in the following code, I
have to make numClasses = 4. I will get an ArrayIndexOutOfBound Exception if I make the numClasses
= 3. Why?

    var conf = new SparkConf().setAppName("DecisionTree")
    var sc = new SparkContext(conf)

    val data = MLUtils.loadLibSVMFile(sc,"data/iris.scale.txt");
    val numClasses = 4;
    val categoricalFeaturesInfo = Map[Int,Int]();
    val impurity = "gini";
    val maxDepth = 5;
    val maxBins = 100;

    val model = DecisionTree.trainClassifier(data, numClasses, categoricalFeaturesInfo, impurity,
maxDepth, maxBins);

    val labelAndPreds = data.map{ point =>
      val prediction = model.predict(point.features);
      (point.label, prediction)
    }

    val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / data.count;
    println("Training Error = " + trainErr);
    println("Learned classification tree model:\n" + model);

-Yao

Mime
View raw message