spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chunnan Yao <>
Subject Support parallelized online matrix factorization for Collaborative Filtering
Date Mon, 06 Apr 2015 06:48:33 GMT
On-line Collaborative Filtering(CF) has been widely used and studied. To
re-train a CF model from scratch every time when new data comes in is very
However, in Spark community we see few discussion about collaborative
filtering on streaming data. Given streaming k-means, streaming logistic
regression, and the on-going incremental model training of Naive Bayes
Classifier (SPARK-4144), we think it is meaningful to consider streaming
Collaborative Filtering support on MLlib.

I've created an issue on JIRA (SPARK-6711) for possible discussions. We
suggest to refer to this paper
( It is based on
SGD instead of ALS, which is easier to be tackled under streaming data.

Fortunately, the authors of this paper have implemented their algorithm as a
Github Project, based on Storm:

Please don't hesitate to give your opinions on this issue and our planned
approach. We'd like to work on this in the next few weeks. 

Feel the sparking Spark!
View this message in context:
Sent from the Apache Spark Developers List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message