spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krakna H <shankark+...@gmail.com>
Subject Re: Spark Matrix Factorization
Date Sat, 28 Jun 2014 15:55:38 GMT
Hi Deb,

Putting your code on github will be much appreciated -- it will give us a
good starting point to adapt for our purposes.

Regards.


On Sat, Jun 28, 2014 at 10:57 AM, Debasish Das [via Apache Spark Developers
List] <ml-node+s1001551n7110h67@n3.nabble.com> wrote:

> Factorization problems are non-convex and so both ALS and DSGD will
> converge to local minima and it is not clear which minima will be better
> than the other until we run both the algorithms and see...
>
> So I will still say get a DSGD version running in the test setup while you
> experiment with the Spark ALS...so that you can see if on your particular
> dataset DSGD is converging to a better minima...
>
> If you want I can put the DSGD code base that I used for experimentation
> on
> github...I am not sure if Professor Re already put it on github...
>
>
> On Sat, Jun 28, 2014 at 2:46 AM, Krakna H <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=7110&i=0>> wrote:
>
> > Hi Deb,
> >
> > Thanks so much for your response! At this point, we haven't determined
> > which of DSGD/ALS to go with and were waiting on guidance like yours to
> > tell us what the right option would be. It looks like ALS seems to be
> good
> > enough for our purposes.
> >
> > Regards.
> >
> >
> > On Fri, Jun 27, 2014 at 12:47 PM, Debasish Das [via Apache Spark
> Developers
> > List] <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=7110&i=1>> wrote:
> >
> > > Hi,
> > >
> > > In my experiments with Jellyfish I did not see any substantial RMSE
> loss
> > > over DSGD for Netflix dataset...
> > >
> > > So we decided to stick with ALS and implemented a family of Quadratic
> > > Minimization solvers that stays in the ALS realm but can solve
> > interesting
> > > constraints(positivity, bounds, L1, equality constrained bounds
> etc)...We
> > > are going to show it at the Spark Summit...Also ALS structure is
> > favorable
> > > to matrix factorization use-cases where missing entries means zero and
> > you
> > > want to compute a global gram matrix using broadcast and use that for
> > each
> > > Quadratic Minimization for all users/products...
> > >
> > > Implementing DSGD in the data partitioning that Spark ALS uses will be
> > > straightforward but I would be more keen to see a dataset where DSGD
> is
> > > showing you better RMSEs than ALS....
> > >
> > > If you have a dataset where DSGD produces much better result could you
> > > please point it to us ?
> > >
> > > Also you can use Jellyfish to run DSGD benchmarks to compare against
> > > ALS...It is multithreaded and if you have good RAM, you should be able
> to
> > > run fairly large datasets...
> > >
> > > Be careful about the default Jellyfish...it has been tuned for netflix
> > > dataset (regularization, rating normalization etc)...So before you
> > compare
> > > RMSE make sure ALS and Jellyfish is running same algorithm (L2
> > regularized
> > > Quadratic Loss)....
> > >
> > > Thanks.
> > > Deb
> > >
> > >
> > > On Fri, Jun 27, 2014 at 3:40 AM, Krakna H <[hidden email]
> > > <http://user/SendEmail.jtp?type=node&node=7098&i=0>> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Just found this thread -- is there an update on including DSGD in
> > Spark?
> > > We
> > > > have a project that entails topic modeling on a document-term matrix
> > > using
> > > > matrix factorization, and were wondering if we should use ALS or
> > attempt
> > > > writing our own matrix factorization implementation on top of Spark.
> > > >
> > > > Thanks.
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Matrix-Factorization-tp55p7097.html
> > > > Sent from the Apache Spark Developers List mailing list archive at
> > > > Nabble.com.
> > > >
> > >
> > >
> > > ------------------------------
> > >  If you reply to this email, your message will be added to the
> discussion
> > > below:
> > >
> > >
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Matrix-Factorization-tp55p7098.html
> > >  To start a new topic under Apache Spark Developers List, email
> > > [hidden email] <http://user/SendEmail.jtp?type=node&node=7110&i=2>
> > > To unsubscribe from Apache Spark Developers List, click here
> > > <
> >
> >
> > > .
> > > NAML
> > > <
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
>
> > >
> > >
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Matrix-Factorization-tp55p7109.html
>
> > Sent from the Apache Spark Developers List mailing list archive at
> > Nabble.com.
> >
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Matrix-Factorization-tp55p7110.html
>  To start a new topic under Apache Spark Developers List, email
> ml-node+s1001551n1h88@n3.nabble.com
> To unsubscribe from Apache Spark Developers List, click here
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=c2hhbmthcmsrc3lzQGdtYWlsLmNvbXwxfDk3NjU5Mzg0>
> .
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Matrix-Factorization-tp55p7111.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message