spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: possible bug in Spark's ALS implementation...
Date Wed, 12 Mar 2014 11:08:14 GMT
On Wed, Mar 12, 2014 at 7:36 AM, Nick Pentreath
<nick.pentreath@gmail.com> wrote:
> @Sean, would it be a good idea to look at changing the regularization in
> Spark's ALS to alpha * lambda? What is the thinking behind this? If I
> recall, the Mahout version added something like (# ratings * lambda) as
> regularization in each factor update (for explicit), but implicit it was
> just lambda (I may be wrong here).

I also used a different default alpha than the one suggested in the
paper: 1, instead of 40. But so does MLlib. And if alpha = 1, the
variation I mention here has no effect.

The idea was that alpha "is supposed to" control how much more weight
a known user-item value gets in the factorization. The weight is "1 +
alpha*r" for nonzero r, and of course "1" otherwise, and alpha can
make the difference larger.

But large alpha has the side-effect of making the regularization terms
relatively smaller in the cost function. This dual effect seemed
undesirable. So: multiply the regularization term by alpha too to
disconnect these effects.

Other ALS papers like
http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
again use a different definition of lambda by stuffing something else
into it. So the absolute value of lambda is already different in
different contexts.

So depending on Michael's settings this could be a red herring but
worth checking. The only other variation was in choosing the random
initial state but that too is the same now in both implementations (at
least in HEAD). The initial state really shouldn't matter so much. I
can't think of other variations.

Michael what was your eval metric?

Mime
View raw message