spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: possible bug in Spark's ALS implementation...
Date Wed, 12 Mar 2014 02:56:16 GMT
On Tue, Mar 11, 2014 at 10:18 PM, Michael Allman <msa@allman.ms> wrote:
> I'm seeing counterintuitive, sometimes nonsensical recommendations. For
> comparison, I've run the training data through Oryx's in-VM implementation
> of implicit ALS with the same parameters. Oryx uses the same algorithm.
> (Source in this file:
> https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java)

On this note, I should say that Oryx varies from that paper in a
couple small ways. In particular it the regularization parameter that
is used in the end is not just lambda, but lambda * alpha. (There are
decent reasons for this.)

So the difference with the "same" parameters could be down to this.
What param values are you using? It might be the difference.

(There is another difference in handling of negative values, but that
is probably irrelevant to you? It is in Spark now too though. It was
not in 0.9.0 but is in HEAD.)


> However, it looks like this code is in fact computing YtY + YtY(Cu - I),
> which is the same as YtYCu. If so, that's a bug. Can someone familiar with
> this code evaluate my claim?

I too can't be 100% certain I'm not missing something, but from a look
at that line, I don't think it is computing YtY(Cu-I). It is indeed
trying to accumulate the value Yt(Cu-I)Y by building it up from
pieces, from rows of Y. For one row of Y that piece is, excusing my
notation, Y(i)t (Cu(i)-1) Y(i). The middle term is just a scalar so
it's fine to multiply it at the end as you see in that line.

You may wish to follow HEAD, which is a bit different:

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L390

The computation is actually the same as before (for positive input),
expressed a little differently.

Happy to help on this given that I know this code a little and the
code you are comparing it to a lot.

Mime
View raw message