spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evan R. Sparks" <evan.spa...@gmail.com>
Subject Re: Persisting MatrixFactorizationModel
Date Wed, 04 Dec 2013 18:31:34 GMT
Ah, actually - I just remembered that the user and product features of the
model are RDDs, so  - you might be better off saving those components to
HDFS and then at load time reading them back in and creating a new
MatrixFactorizationModel. Sorry for the confusion!

Note, the above solution only works if you want to deploy your model to a
spark cluster. If the model is small enough and you really want to deploy
it to several hosts, you could consider calling collect() on its components
and then serializing the results as I suggested before. In general these
models are usually pretty small (order of MB), so that's not such a bad
option - when you get to 10s of millions of users or products, then you
might consider pre-materializing some pieces of it (e.g. calculate top 100
predictions for all users or something) and save those intermediate results
to serve up.

- Evan


On Wed, Dec 4, 2013 at 9:54 AM, Aslan Bekirov <aslanbekirov@gmail.com>wrote:

> I thought to convert model to RDD and save to HDFS, and then load it.
>
> I will try your method. Thanks a lot.
>
>
>
> On Wed, Dec 4, 2013 at 7:41 PM, Evan R. Sparks <evan.sparks@gmail.com>wrote:
>
>> The model is serializable - so you should be able to write it out to disk
>> and load it up in another program.
>>
>> See, e.g. - https://gist.github.com/ramn/5566596 (Note, I haven't tested
>> this particular example, but it looks alright).
>>
>> Spark makes use of this type of scala (and kryo, etc.) serialization
>> internally, so you can check the Spark codebase for more examples.
>>
>>
>> On Wed, Dec 4, 2013 at 9:34 AM, Aslan Bekirov <aslanbekirov@gmail.com>wrote:
>>
>>> Hi All,
>>>
>>> I am creating a model by calling train method of ALS.
>>>
>>> val model = ALS.train(ratings.....)
>>>
>>> I need to persist this model.  Use it from different clients, enable
>>> clients to make predictions using this model. In other words, persist and
>>> reload this model.
>>>
>>> Any suggestions, please?
>>>
>>> BR,
>>> Aslan
>>>
>>
>>
>

Mime
View raw message