spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <>
Subject Re: possible bug in Spark's ALS implementation...
Date Wed, 12 Mar 2014 02:57:40 GMT
Line 376 should be correct as it is computing \sum_i (c_i - 1) x_i
x_i^T, = \sum_i (alpha * r_i) x_i x_i^T. Are you computing some
metrics to tell which recommendation is better? -Xiangrui

On Tue, Mar 11, 2014 at 6:38 PM, Xiangrui Meng <> wrote:
> Hi Michael,
> I can help check the current implementation. Would you please go to
> and create a ticket
> about this issue with component "MLlib"? Thanks!
> Best,
> Xiangrui
> On Tue, Mar 11, 2014 at 3:18 PM, Michael Allman <> wrote:
>> Hi,
>> I'm implementing a recommender based on the algorithm described in
>> This algorithm forms the
>> basis for Spark's ALS implementation for data sets with implicit features.
>> The data set I'm working with is proprietary and I cannot share it, however
>> I can say that it's based on the same kind of data in the paper---relative
>> viewing time of videos. (Specifically, the "rating" for each video is
>> defined as total viewing time across all visitors divided by video
>> duration).
>> I'm seeing counterintuitive, sometimes nonsensical recommendations. For
>> comparison, I've run the training data through Oryx's in-VM implementation
>> of implicit ALS with the same parameters. Oryx uses the same algorithm.
>> (Source in this file:
>> The recommendations made by each system compared to one other are very
>> different---moreso than I think could be explained by differences in initial
>> state. The recommendations made by the Oryx models look much better,
>> especially as I increase the number of latent factors and the iterations.
>> The Spark models' recommendations don't improve with increases in either
>> latent factors or iterations. Sometimes, they get worse.
>> Because of the (understandably) highly-optimized and terse style of Spark's
>> ALS implementation, I've had a very hard time following it well enough to
>> debug the issue definitively. However, I have found a section of code that
>> looks incorrect. As described in the paper, part of the implicit ALS
>> algorithm involves computing a matrix product YtCuY (equation 4 in the
>> paper). To optimize this computation, this expression is rewritten as YtY +
>> Yt(Cu - I)Y. I believe that's what should be happening here:
>> However, it looks like this code is in fact computing YtY + YtY(Cu - I),
>> which is the same as YtYCu. If so, that's a bug. Can someone familiar with
>> this code evaluate my claim?
>> Cheers,
>> Michael

View raw message