well, sounds trivial now ... !
thanks ;-)

2016-07-02 10:04 GMT+02:00 Yanbo Liang <ybliang8@gmail.com>:
Hi Mathieu,

Using the new ml package to train a RandomForestClassificationModel, you can get feature importance. Then you can convert the prediction result to RDD and feed it into BinaryClassificationEvaluator for ROC curve. You can refer the following code snippet:

val rf = new RandomForestClassifier()
val model = rf.fit(trainingData)

val predictions = model.transform(testData)

val scoreAndLabels =
  predictions.select(model.getRawPredictionCol, model.getLabelCol).rdd.map {
    case Row(rawPrediction: Vector, label: Double) => (rawPrediction(1), label)
    case Row(rawPrediction: Double, label: Double) => (rawPrediction, label)
val metrics = new BinaryClassificationMetrics(scoreAndLabels)


2016-06-15 7:13 GMT-07:00 matd <matdpro@gmail.com>:
Hi ml folks !

I'm using a Random Forest for a binary classification.
I'm interested in getting both the ROC *curve* and the feature importance
from the trained model.

If I'm not missing something obvious, the ROC curve is only available in the
old mllib world, via BinaryClassificationMetrics. In the new ml package,
only the areaUnderROC and areaUnderPR are available through

The feature importance is only available in ml package, through

Any idea to get both ?


View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Get-both-feature-importance-and-ROC-curve-from-a-random-forest-classifier-tp27175.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org