spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pun <punintended...@gmail.com>
Subject Spark ML - LogisticRegression extract words with highest weights
Date Thu, 05 Oct 2017 21:12:09 GMT
I am using Spark ML's pipeline to classify text documents with the following
steps:
Tokenizer -> CountVectorizer -> LogisticRegression 
I want to be able to print the words with the highest weights. Can this be
done?
So far I have been able to extract the LR coefficients, but can those be
tied up to the actual words?
import org.apache.spark.ml.classification.{LogisticRegression,
LogisticRegressionModel}import org.apache.spark.ml.feature.{CountVectorizer,
Tokenizer}val tokenizer = new Tokenizer()  .setInputCol("text") 
.setOutputCol("words")val countVectorizer = new CountVectorizer() 
.setInputCol(tokenizer.getOutputCol)  .setOutputCol("features")val lr = new
LogisticRegression()  .setMaxIter(10)  .setRegParam(0.01)val pipeline = new
Pipeline()  .setStages(Array(tokenizer, countVectorizer, lr))// Fit the
pipeline to training documents.val model = pipeline.fit(training)val results
= model.transform(test)val lrm: LogisticRegressionModel =
model.stages.last.asInstanceOf[LogisticRegressionModel]// PRINT
COEFFICIENTSprintln(s"LR Model
coefficients:\n${lrm.coefficients.toArray.mkString("\n")}")(lrm.intercept,
lrm.coefficients)




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Mime
View raw message