Sure - in theory this sounds great. But in practice it's much faster and a whole lot simpler to just serve the model from single instance in memory. Optionally you can multithread within that (as Oryx 1 does).

There are very few real world use cases where the model is so large that it HAS to be distributed.

Having said this, it's certainly possible to distribute model serving for factor-like models (like ALS). One idea I'm working on now is using Elasticsearch for exactly this purpose - but that more because I'm using it for filtering of recommendation results and combining with search, so overall it's faster to do it this way.

For the pure matrix algebra part, single instance in memory is way faster. 

Sent from Mailbox

On Fri, Nov 7, 2014 at 8:00 PM, Duy Huynh <> wrote:

hi nick.. sorry about the confusion.  originally i had a question specifically about word2vec, but my follow up question on distributed model is a more general question about saving different types of models.

on distributed model, i was hoping to implement a model parallelism, so that different workers can work on different parts of the models, and then merge the results at the end at the single master model.

On Fri, Nov 7, 2014 at 12:20 PM, Nick Pentreath <> wrote:
Currently I see the word2vec model is collected onto the master, so the model itself is not distributed. 

I guess the question is why do you need  a distributed model? Is the vocab size so large that it's necessary? For model serving in general, unless the model is truly massive (ie cannot fit into memory on a modern high end box with 64, or 128GB ram) then single instance is way faster and simpler (using a cluster of machines is more for load balancing / fault tolerance).

What is your use case for model serving?

Sent from Mailbox

On Fri, Nov 7, 2014 at 5:47 PM, Duy Huynh <> wrote:

you're right, serialization works.

what is your suggestion on saving a "distributed" model?  so part of the model is in one cluster, and some other parts of the model are in other clusters.  during runtime, these sub-models run independently in their own clusters (load, train, save).  and at some point during run time these sub-models merge into the master model, which also loads, trains, and saves at the master level.

much appreciated.

On Fri, Nov 7, 2014 at 2:53 AM, Evan R. Sparks <> wrote:
There's some work going on to support PMML - - but it's not yet been merged into master.

What are you used to doing in other environments? In R I'm used to running save(), same with matlab. In python either pickling things or dumping to json seems pretty common. (even the scikit-learn docs recommend pickling - These all seem basically equivalent java serialization to me..

Would some helper functions (in, say, mllib.util.modelpersistence or something) make sense to add?

On Thu, Nov 6, 2014 at 11:36 PM, Duy Huynh <> wrote:
that works.  is there a better way in spark?  this seems like the most common feature for any machine learning work - to be able to save your model after training it and load it later.

On Fri, Nov 7, 2014 at 2:30 AM, Evan R. Sparks <> wrote:
Plain old java serialization is one straightforward approach if you're in java/scala.

On Thu, Nov 6, 2014 at 11:26 PM, ll <> wrote:
what is the best way to save an mllib model that you just trained and reload
it in the future?  specifically, i'm using the mllib word2vec model...

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail: