spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ameet Talwalkar <>
Subject Re: ML Algos
Date Fri, 16 Aug 2013 20:51:54 GMT
Thanks for your email -- I've responded inline.

On Thu, Aug 15, 2013 at 7:13 PM, Lijie Xu <> wrote:

> Quite interesting. I have some questions about this amazing project:
> 1) In "Logistic Regression -­‐ Weak Scaling", MLlib and VW run slower in
> each processor for fixed problem while data/machines are increasing. Could
> you explain which component causes this performance degradataion problem.
> Synchronization, network traffic, data partition or etc. ?

This is a good question, and to be honest, we still need to investigate
this further to get a better understanding of what's going on here.

> 2) What's the relationship between MLBase and GraphX?

Right now the two projects are being developed separately.  As of now
MLbase does not support graph-based functionality, though moving forward,
it would be quite interesting to extend the MLI to include graph-based
primitives and leverage GraphX as a runtime.

> 3) MLBase may require Spark to provide some new features for implementing
> some specific algorithms. Is there any? Or you have added some new
> fundamental features which are not supported in Spark-0.7?

As MLbase is a relatively new project, we have been developing MLlib and
MLI to be compatible with Spark-0.8.

> On Fri, Aug 16, 2013 at 4:01 AM, Ameet Talwalkar <>wrote:
>> The following slides<>
>> the ML algorithms to be included in MLlib (slide 49) and MLI (slide 107) in
>> the near future.  We plan to include additional
>> classification/regression/CF/clustering/optimization primitives over time
>> with the help of the open-source community, and based on feedback from
>> users about desired functionality.  Moreover, we ultimately aim to add
>> advance ML functionality, as briefly described in slide 140.
>> -Ameet
>> On Thu, Aug 15, 2013 at 12:32 PM, Gowtham N <>wrote:
>>> Hi,
>>> Can someone give details about the future work in ML algorithms (Inside
>>> mllib folder).
>>> Currently there are some basic algorithms implemented. Is there any
>>> roadmap regarding what ML algorithms are required?

View raw message