spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Allman <>
Subject Re: possible bug in Spark's ALS implementation...
Date Wed, 12 Mar 2014 23:42:37 GMT
Thank you everyone for your feedback. It's been very helpful, and though I
still haven't found the cause of the difference between Spark and Oryx, I
feel I'm making progress.

Xiangrui asked me to create a ticket for this issue. The reason I didn't do
this originally is because it's not clear to me yet that this is a bug or a
mistake on my part. I'd like to see where this conversation goes and then
file a more clearcut issue if applicable.

Sean pointed out that Oryx differs in its use of the regularization
parameter lambda. I'm aware of this and have been compensating for this
difference from the start. Also, the handling of negative values is indeed
irrelevant as I have none in my data.

After reviewing Sean's analysis and running some calculations in the
console, I agree that the Spark code does compute YtCuY correctly.

Regarding testing, I'm computing EPR on a test set as outlined in the paper.
I'm training on three weeks of data and testing on the following week. I
recently updated my data sets and rebuilt and tested the new models. The
results were inconclusive in that both models scored about the same.

I'm continuing to investigate the source of the wide difference in
recommendations between implementations. I will reply with my findings when
I have something more definitive.

Cheers and thanks again.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message