spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <>
Subject Re: deploying a model built in mllib
Date Mon, 27 Oct 2014 16:39:21 GMT
We are working on the pipeline features, which would make this
procedure much easier in MLlib. This is still a WIP and the main JIRA
is at:


On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani
<> wrote:
> Hello,
> I have been prototyping a text classification model that my company would
> like to eventually put into production.  Our technology stack is currently
> Java based but we would like to be able to build our models in Spark/MLlib
> and then export something like a PMML file which can be used for model
> scoring in real-time.
> I have been using scikit learn where I am able to take the training data
> convert the text data into a sparse data format and then take the other
> features and use the dictionary vectorizer to do one-hot encoding for the
> other categorical variables.  All of those things seem to be possible in
> mllib but I am still puzzled about how that can be packaged in such a way
> that the incoming data can be first made into feature vectors and then
> evaluated as well.
> Are there any best practices for this type of thing in Spark?  I hope this
> is clear but if there are any confusions then please let me know.
> Thanks,
> Chirag

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message