spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pengcheng <>
Subject SparkML RandomForest
Date Thu, 11 Aug 2016 03:42:00 GMT
Hi There,

I was comparing Randomforest in sparkml(
and spark mllib(org.apache.spark.mllib.tree) using the same datasets and
same parameter settings, spark mllib always gives me better results on test
data sets.
I was wondering

1. Did anyone notice similar performance
​difference ​
as I do?
2. How to output parameters for Pipelinemodel?

for example: I want to output the parameters trained for
RandomForestClassifier. None of these (model.params.toString or
 model.explainParams() or model.extractParamMap())
output meaningful parameters such as

*val *rf = *new *RandomForestClassifier()

*val *indexer = *new *StringIndexer()

*val *pipeline = *new *Pipeline().setStages(*Array*(indexer, rf))

*val *model: PipelineModel =



View raw message