spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aslan Bekirov <aslanbeki...@gmail.com>
Subject Re: Persisting MatrixFactorizationModel
Date Thu, 05 Dec 2013 09:47:51 GMT
Thanks a lot Evan...


On Wed, Dec 4, 2013 at 8:31 PM, Evan R. Sparks <evan.sparks@gmail.com>wrote:

> Ah, actually - I just remembered that the user and product features of the
> model are RDDs, so  - you might be better off saving those components to
> HDFS and then at load time reading them back in and creating a new
> MatrixFactorizationModel. Sorry for the confusion!
>
> Note, the above solution only works if you want to deploy your model to a
> spark cluster. If the model is small enough and you really want to deploy
> it to several hosts, you could consider calling collect() on its components
> and then serializing the results as I suggested before. In general these
> models are usually pretty small (order of MB), so that's not such a bad
> option - when you get to 10s of millions of users or products, then you
> might consider pre-materializing some pieces of it (e.g. calculate top 100
> predictions for all users or something) and save those intermediate results
> to serve up.
>
> - Evan
>
>
> On Wed, Dec 4, 2013 at 9:54 AM, Aslan Bekirov <aslanbekirov@gmail.com>wrote:
>
>> I thought to convert model to RDD and save to HDFS, and then load it.
>>
>> I will try your method. Thanks a lot.
>>
>>
>>
>> On Wed, Dec 4, 2013 at 7:41 PM, Evan R. Sparks <evan.sparks@gmail.com>wrote:
>>
>>> The model is serializable - so you should be able to write it out to
>>> disk and load it up in another program.
>>>
>>> See, e.g. - https://gist.github.com/ramn/5566596 (Note, I haven't
>>> tested this particular example, but it looks alright).
>>>
>>> Spark makes use of this type of scala (and kryo, etc.) serialization
>>> internally, so you can check the Spark codebase for more examples.
>>>
>>>
>>> On Wed, Dec 4, 2013 at 9:34 AM, Aslan Bekirov <aslanbekirov@gmail.com>wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am creating a model by calling train method of ALS.
>>>>
>>>> val model = ALS.train(ratings.....)
>>>>
>>>> I need to persist this model.  Use it from different clients, enable
>>>> clients to make predictions using this model. In other words, persist and
>>>> reload this model.
>>>>
>>>> Any suggestions, please?
>>>>
>>>> BR,
>>>> Aslan
>>>>
>>>
>>>
>>
>

Mime
View raw message