spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chirag lakhani <>
Subject deploying a model built in mllib
Date Mon, 27 Oct 2014 15:56:29 GMT

I have been prototyping a text classification model that my company would
like to eventually put into production.  Our technology stack is currently
Java based but we would like to be able to build our models in Spark/MLlib
and then export something like a PMML file which can be used for model
scoring in real-time.

I have been using scikit learn where I am able to take the training data
convert the text data into a sparse data format and then take the other
features and use the dictionary vectorizer to do one-hot encoding for the
other categorical variables.  All of those things seem to be possible in
mllib but I am still puzzled about how that can be packaged in such a way
that the incoming data can be first made into feature vectors and then
evaluated as well.

Are there any best practices for this type of thing in Spark?  I hope this
is clear but if there are any confusions then please let me know.



View raw message