spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ameet Talwalkar <am...@eecs.berkeley.edu>
Subject Re: Spark Matrix Factorization
Date Fri, 03 Jan 2014 18:49:50 GMT
Hi all,

The following pull
request<https://github.com/apache/incubator-spark/pull/315>
implementing
SVD in MLlib is highly relevant to this discussion.

-Ameet


On Fri, Jan 3, 2014 at 10:43 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

>
>
>
> On Fri, Jan 3, 2014 at 10:28 AM, Sebastian Schelter <ssc@apache.org>wrote:
>
>> > I wonder if anyone might have recommendation on scala native
>> implementation
>> > of SVD.
>>
>> Mahout has a scala implementation of an SVD variant called Stochastic SVD:
>>
>>
>> https://svn.apache.org/viewvc/mahout/trunk/math-scala/src/main/scala/org/apache/mahout/math/scalabindings/SSVD.scala?view=markup
>
>
> Mahout also has SVD and Eigen decompositions  mapped to scala as svd() and
> eigen(). Unfortunately i have not put it on wiki yet but the summary is
> available here https://issues.apache.org/jira/browse/MAHOUT-1297
>
> Mahout also has distributed PCA implementation (which is based on
> distributed Stochastic SVD and has a special provisions for sparse matrix
> cases). Unfortunately our wiki is in flux now due to migration off
> confluence to CMS so the SSVD page has not yet been migrated to CMS so
> confluence version is here
> https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular+Value+Decomposition
>
>
>>
>> Otherwise, all the major java math libraries (mahout math, jblas,
>> commons-math) should provide an implementation that you can use in scala.
>>
>> --sebastian
>>
>> > C
>> >
>> >
>> >
>> >
>> > On Thu, Jan 2, 2014 at 7:06 PM, Ameet Talwalkar <
>> ameet@eecs.berkeley.edu>wrote:
>> >
>> >> Hi Deb,
>> >>
>> >> Thanks for your email.  We currently do not have a DSGD implementation
>> in
>> >> MLlib. Also, just to clarify, DSGD is not a variant of ALS, but rather
>> a
>> >> different algorithm for solving the same the same bi-convex objective
>> >> function.
>> >>
>> >> It would be a good thing to do add, but to the best of my knowledge, no
>> >> one is actively working on this right now.
>> >>
>> >> Also, as you mentioned, the ALS implementation in mllib is more
>> >> robust/scalable than the one in spark.examples.
>> >>
>> >> -Ameet
>> >>
>> >>
>> >> On Thu, Jan 2, 2014 at 3:16 PM, Debasish Das <debasish.das83@gmail.com
>> >wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> I am not noticing any DSGD implementation of ALS in Spark.
>> >>>
>> >>> There are two ALS implementations.
>> >>>
>> >>> org.apache.spark.examples.SparkALS does not run on large matrices and
>> >>> seems more like a demo code.
>> >>>
>> >>> org.apache.spark.mllib.recommendation.ALS looks feels more robust
>> version
>> >>> and I am experimenting with it.
>> >>>
>> >>> References here are Jellyfish, Twitter's implementation of Jellyfish
>> >>> called Scalafish, Google paper called Sparkler and similar idea put
>> forward
>> >>> by IBM paper by Gemulla et al. (large-scale matrix factorization with
>> >>> distributed stochastic gradient descent)
>> >>>
>> >>> https://github.com/azymnis/scalafish
>> >>>
>> >>> Are there any plans of adding DSGD in Spark or there are any existing
>> >>> JIRA ?
>> >>>
>> >>> Thanks.
>> >>> Deb
>> >>>
>> >>>
>> >>
>> >
>> >
>>
>>
>

Mime
View raw message