mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahesh Balija <>
Subject Re: Mahout Vs Spark
Date Wed, 22 Oct 2014 16:20:09 GMT
Hi Team,

Thanks for your replies, even if you consider the strong implementation of
Recommendations and SVD in Mahout, I would still say that even in Spark
1.1.0 there is support for collaborative filtering (alternating least
squares (ALS)) and under dimensionality reduction SVD and PCA. With fast
pace contributions, I believe Spark may NOT be far away to have new and
stable algorithms added to it (Like ANN, HMM etc and support for scientific

Ted, Even though Mahout (1.0) development code base support Scala and Spark
bindings externally, Spark has this inbuilt support for Scala (as its been
developed in Scala). And Numpy is a python based scientific library which
need to be used for the support of Python based MLlib in Spark. Benefits
are python is also supported in Spark for Python users.

Major uniqueness of Mahout is, as Mahout is inherited from Lucene it has
built-in support for Text processing. Ofcourse I do NOT believe its a
strong point as I assume that, developers knowing Lucene can be able to
easily use it with Spark through Java interface.

Mahout currently stopped support for Hadoop (i.e., for further libraries)
on the other hand Spark can re-use the data present in Hadoop/Hbase easily
(May NOT be mapreduce functionality as Spark has its own computation layer).

*As a user of Mahout since long time I strongly support Mahout (despite of
poor visualization capabilities), at the same time, I am trying to
understand if Spark continues to be evolved in MLLib package and being
support for in-memory computation and with rich scientific libraries
through Scala and support for languages like Java/Scala/Python will the
survival of Mahout be questionable?*

Mahesh Balija.

On Wed, Oct 22, 2014 at 1:26 PM, Martin, Nick <> wrote:

> I know we lost the maintainer for fpgrowth somewhere along the line but
> it's definitely something I'd love to see carried forward, too.
> Sent from my iPhone
> > On Oct 22, 2014, at 8:09 AM, "Brian Dolan" <> wrote:
> >
> > Sing it, brother!  I miss FP Growth as well.  Once the Scala bindings
> are in, I'm hoping to work up some time series methods.
> >
> >> On Oct 21, 2014, at 8:00 PM, Lee S <> wrote:
> >>
> >> As a developer, who is facing the library  chosen between mahout and
> mllib,
> >> I have some idea below.
> >> Mahout has no any decision tree algorithm. But MLLIB has the components
> of
> >> constructing a decision tree algorithm such as gini index, information
> >> gain. And also  I think mahout can add algorithm about frequency pattern
> >> mining which is very import in feature selection and statistic analysis.
> >> MLLIB has no frequent mining algorithms.
> >> p.s Why fpgrowth algorithm is removed in version 0.9?
> >>
> >> 2014-10-22 9:12 GMT+08:00 Vibhanshu Prasad <>:
> >>
> >>> actually spark is available in python also, so users of spark are
> having an
> >>> upper hand over users of traditional users of mahout. This is
> applicable to
> >>> all the libraries of python (including numpy).
> >>>
> >>> On Wed, Oct 22, 2014 at 3:54 AM, Ted Dunning <>
> >>> wrote:
> >>>
> >>>> On Tue, Oct 21, 2014 at 3:04 PM, Mahesh Balija <
> >>>
> >>>> wrote:
> >>>>
> >>>>> I am trying to differentiate between Mahout and Spark, here is the
> >>> small
> >>>>> list,
> >>>>>
> >>>>> Features Mahout Spark  Clustering Y Y  Classification Y Y
> >>> Regression Y
> >>>>> Y  Dimensionality Reduction Y Y  Java Y Y  Scala N Y  Python N Y
> >>> Numpy N
> >>>>> Y  Hadoop Y Y  Text Mining Y N  Scala/Spark Bindings Y N/A
> >>> scalability Y
> >>>>> Y
> >>>>
> >>>> Mahout doesn't actually have strong features for clustering,
> >>> classification
> >>>> and regression. Mahout is very strong in recommendations (which you
> don't
> >>>> mention) and dimensionality reduction.
> >>>>
> >>>> Mahout does support scala in the development version.
> >>>>
> >>>> What do you mean by support for Numpy?
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message