From my personal experience - we're reading the metadata of the features column in the dataframe to extract mapping of the feature indices to the original feature name, and use this mapping to translate the model coefficients into a JSON string that maps the original feature names to their weights. The production environment has a simple code that evaluates a logistic model based on this JSON string and the real inputs. 

I would be very interested to find a more straight-forward approach to export the model into a format that's readable by systems without Spark installed on them.

On Sat, Jul 2, 2016 at 10:45 AM, Yanbo Liang <> wrote:
Let's suppose you have trained a LogisticRegressionModel and saved it at "/tmp/lr-model". You can copy the directory to production environment and use it to make prediction on users new data. You can refer the following code snippets:

val model = LogisiticRegressionModel.load("/tmp/lr-model")
val data = newDataset
val prediction = model.transform(data)

However, usually we save/load PipelineModel which include necessary feature transformers and model training process rather than the single model, but they are similar operations.


2016-06-23 10:54 GMT-07:00 Saurabh Sardeshpande <>:
Hi all,

How do you reliably deploy a spark model in production? Let's say I've done a lot of analysis and come up with a model that performs great. I have this "model file" and I'm not sure what to do with it. I want to build some kind of service around it that takes some inputs, converts them into a feature, runs the equivalent of 'transform', i.e. predict the output and return the output.

At the Spark Summit I heard a lot of talk about how this will be easy to do in Spark 2.0, but I'm looking for an solution sooner. PMML support is limited and the model I have can't be exported in that format.

I would appreciate any ideas around this, especially from personal experiences.