spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Is it possible to do incremental training using ALSModel (MLlib)?
Date Sat, 03 Jan 2015 09:36:30 GMT
Yes, it is easy to simply start a new factorization from the current model
solution. It works well. That's more like incremental *batch* rebuilding of
the model. That is not in MLlib but fairly trivial to add.

You can certainly 'fold in' new data to approximately update with one new
datum too, which you can find online. This is not quite the same idea as
streaming SGD. I'm not sure this fits the RDD model well since it entails
updating one element at a time but mini batch could be reasonable.
On Jan 3, 2015 5:29 AM, "Peng Cheng" <> wrote:

> I was under the impression that ALS wasn't designed for it :-< The famous
> ebay online recommender uses SGD
> However, you can try using the previous model as starting point, and
> gradually reduce the number of iteration after the model stablize. I never
> verify this idea, so you need to at least cross-validate it before putting
> into productio
> On 2 January 2015 at 04:40, Wouter Samaey <>
> wrote:
>> Hi all,
>> I'm curious about MLlib and if it is possible to do incremental training
>> on
>> the ALSModel.
>> Usually training is run first, and then you can query. But in my case,
>> data
>> is collected in real-time and I want the predictions of my ALSModel to
>> consider the latest data without complete re-training phase.
>> I've checked out these resources, but could not find any info on how to
>> solve this:
>> My question fits in a larger picture where I'm using Prediction IO, and
>> this
>> in turn is based on Spark.
>> Thanks in advance for any advice!
>> Wouter
>> --
>> View this message in context:
>> Sent from the Apache Spark User List mailing list archive at
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

View raw message