spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wouter Samaey <wouter.sam...@storefront.be>
Subject Re: Is it possible to do incremental training using ALSModel (MLlib)?
Date Mon, 05 Jan 2015 12:17:27 GMT
One other idea was that I don’t need to re-train the model, but simply pass all the current
user’s recent ratings (including one’s created after the training) to the existing model…

Is this a valid option?


--------
Wouter Samaey
Zaakvoerder Storefront BVBA

Tel: +32 472 72 83 07
Web: http://storefront.be

LinkedIn: http://www.linkedin.com/in/woutersamaey

> On 05 Jan 2015, at 13:13, Sean Owen <sowen@cloudera.com> wrote:
> 
> In the first instance, I'm suggesting that ALS in Spark could perhaps
> expose a run() method that accepts a previous
> MatrixFactorizationModel, and uses the product factors from it as the
> initial state instead. If anybody seconds that idea, I'll make a PR.
> 
> The second idea is just fold-in:
> http://www.slideshare.net/srowen/big-practical-recommendations-with-alternating-least-squares/14
> 
> Whether you do this or something like SGD, inside or outside Spark,
> depends on your requirements I think.
> 
> On Sat, Jan 3, 2015 at 12:04 PM, Wouter Samaey
> <wouter.samaey@storefront.be> wrote:
>> Do you know a place where I could find a sample or tutorial for this?
>> 
>> I'm still very new at this. And struggling a bit...
>> 
>> Thanks in advance
>> 
>> Wouter
>> 
>> Sent from my iPhone.
>> 
>> On 03 Jan 2015, at 10:36, Sean Owen <sowen@cloudera.com> wrote:
>> 
>> Yes, it is easy to simply start a new factorization from the current model
>> solution. It works well. That's more like incremental *batch* rebuilding of
>> the model. That is not in MLlib but fairly trivial to add.
>> 
>> You can certainly 'fold in' new data to approximately update with one new
>> datum too, which you can find online. This is not quite the same idea as
>> streaming SGD. I'm not sure this fits the RDD model well since it entails
>> updating one element at a time but mini batch could be reasonable.
>> 
>> On Jan 3, 2015 5:29 AM, "Peng Cheng" <rhwing@gmail.com> wrote:
>>> 
>>> I was under the impression that ALS wasn't designed for it :-< The famous
>>> ebay online recommender uses SGD
>>> However, you can try using the previous model as starting point, and
>>> gradually reduce the number of iteration after the model stablize. I never
>>> verify this idea, so you need to at least cross-validate it before putting
>>> into productio
>>> 
>>> On 2 January 2015 at 04:40, Wouter Samaey <wouter.samaey@storefront.be>
>>> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I'm curious about MLlib and if it is possible to do incremental training
>>>> on
>>>> the ALSModel.
>>>> 
>>>> Usually training is run first, and then you can query. But in my case,
>>>> data
>>>> is collected in real-time and I want the predictions of my ALSModel to
>>>> consider the latest data without complete re-training phase.
>>>> 
>>>> I've checked out these resources, but could not find any info on how to
>>>> solve this:
>>>> https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html
>>>> 
>>>> http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html
>>>> 
>>>> My question fits in a larger picture where I'm using Prediction IO, and
>>>> this
>>>> in turn is based on Spark.
>>>> 
>>>> Thanks in advance for any advice!
>>>> 
>>>> Wouter
>>>> 
>>>> 
>>>> 
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-do-incremental-training-using-ALSModel-MLlib-tp20942.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message