spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristina Rogale Plazonic <>
Subject RandomForestClassifer does not recognize number of classes, nor can number of classes be set
Date Tue, 29 Sep 2015 14:14:45 GMT

I'm trying out the ml.classification.RandomForestClassifer() on a simple
dataframe and it returns an exception that number of classes has not been
set in my dataframe. However, I cannot find a function that would set
number of classes, or pass it as an argument anywhere. In mllib, numClasses
is a parameter passed when training the model. In ml, there is an ugly hack
using StringIndexer, but should you really be using the hack?
LogisticRegression and NaiveBayes in ml work without setting the number of

Thanks for any pointers!

My code:

import org.apache.spark.mllib.linalg.{Vector, Vectors}

case class Record(label:Double,

val df = sc.parallelize(Seq( Record(0.0, Vectors.dense(1.0, 0.0) ),
                        Record(0.0, Vectors.dense(1.1, 0.0) ),
                        Record(0.0, Vectors.dense(1.2, 0.0) ),
                        Record(1.0, Vectors.dense(0.0, 1.2) ),
                        Record(1.0, Vectors.dense(0.0, 1.3) ),
                        Record(1.0, Vectors.dense(0.0, 1.7) ))

val rf = new RandomForestClassifier()
val rfmodel =

And the error is:

scala> val rfmodel =
java.lang.IllegalArgumentException: RandomForestClassifier was given input
with invalid label column label, without the number of classes specified.
See StringIndexer.
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)

View raw message