spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Re: Spark Matrix Factorization
Date Fri, 27 Jun 2014 16:46:55 GMT
Hi,

In my experiments with Jellyfish I did not see any substantial RMSE loss
over DSGD for Netflix dataset...

So we decided to stick with ALS and implemented a family of Quadratic
Minimization solvers that stays in the ALS realm but can solve interesting
constraints(positivity, bounds, L1, equality constrained bounds etc)...We
are going to show it at the Spark Summit...Also ALS structure is favorable
to matrix factorization use-cases where missing entries means zero and you
want to compute a global gram matrix using broadcast and use that for each
Quadratic Minimization for all users/products...

Implementing DSGD in the data partitioning that Spark ALS uses will be
straightforward but I would be more keen to see a dataset where DSGD is
showing you better RMSEs than ALS....

If you have a dataset where DSGD produces much better result could you
please point it to us ?

Also you can use Jellyfish to run DSGD benchmarks to compare against
ALS...It is multithreaded and if you have good RAM, you should be able to
run fairly large datasets...

Be careful about the default Jellyfish...it has been tuned for netflix
dataset (regularization, rating normalization etc)...So before you compare
RMSE make sure ALS and Jellyfish is running same algorithm (L2 regularized
Quadratic Loss)....

Thanks.
Deb


On Fri, Jun 27, 2014 at 3:40 AM, Krakna H <shankark+sys@gmail.com> wrote:

> Hi all,
>
> Just found this thread -- is there an update on including DSGD in Spark? We
> have a project that entails topic modeling on a document-term matrix using
> matrix factorization, and were wondering if we should use ALS or attempt
> writing our own matrix factorization implementation on top of Spark.
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Matrix-Factorization-tp55p7097.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message