Ah, actually - I just remembered that the user and product features of the model are RDDs, so  - you might be better off saving those components to HDFS and then at load time reading them back in and creating a new MatrixFactorizationModel. Sorry for the confusion!

Note, the above solution only works if you want to deploy your model to a spark cluster. If the model is small enough and you really want to deploy it to several hosts, you could consider calling collect() on its components and then serializing the results as I suggested before. In general these models are usually pretty small (order of MB), so that's not such a bad option - when you get to 10s of millions of users or products, then you might consider pre-materializing some pieces of it (e.g. calculate top 100 predictions for all users or something) and save those intermediate results to serve up.

- Evan


On Wed, Dec 4, 2013 at 9:54 AM, Aslan Bekirov <aslanbekirov@gmail.com> wrote:
I thought to convert model to RDD and save to HDFS, and then load it.

I will try your method. Thanks a lot.



On Wed, Dec 4, 2013 at 7:41 PM, Evan R. Sparks <evan.sparks@gmail.com> wrote:
The model is serializable - so you should be able to write it out to disk and load it up in another program. 

See, e.g. - https://gist.github.com/ramn/5566596 (Note, I haven't tested this particular example, but it looks alright).

Spark makes use of this type of scala (and kryo, etc.) serialization internally, so you can check the Spark codebase for more examples.


On Wed, Dec 4, 2013 at 9:34 AM, Aslan Bekirov <aslanbekirov@gmail.com> wrote:
Hi All,

I am creating a model by calling train method of ALS.

val model = ALS.train(ratings.....)

I need to persist this model.  Use it from different clients, enable clients to make predictions using this model. In other words, persist and reload this model.

Any suggestions, please?

BR,
Aslan