spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <>
Subject Re: Research ideas using spark
Date Wed, 15 Jul 2015 19:17:46 GMT
Silly question… 

When thinking about a PhD thesis… do you want to tie it to a specific technology or do you
want to investigate an idea but then use a specific technology. 
Or is this an outdated way of thinking? 

"I am doing my PHD thesis on large scale machine learning e.g  Online learning, batch and
mini batch learning.”

So before we look at technologies like Spark… could the OP break down a more specific concept
or idea that he wants to pursue? 

Looking at what Jorn said… 

Using machine learning to better predict workloads in terms of managing clusters… This could
be interesting… but is it enough for a PhD thesis, or of interest to the OP? 

> On Jul 15, 2015, at 9:43 AM, Jörn Franke <> wrote:
> Well one of the strength of spark is standardized general distributed processing allowing
many different types of processing, such as graph processing, stream processing etc. The limitation
is that it is less performant than one system focusing only on one type of processing (eg
graph processing). I miss - and this may not be spark specific - some artificial intelligence
to manage a cluster, e.g. Predicting workloads, how long a job may run based on previously
executed similar jobs etc. Furthermore, many optimizations you have do to manually, e.g. Bloom
filters, partitioning etc - if you find here as well some intelligence that does this automatically
based on previously executed jobs taking into account that optimizations themselves change
over time would be great... You may also explore feature interaction
> Le mar. 14 juil. 2015 à 7:19, Shashidhar Rao < <>>
a écrit :
> Hi,
> I am doing my PHD thesis on large scale machine learning e.g  Online learning, batch
and mini batch learning.
> Could somebody help me with ideas especially in the context of Spark and to the above
learning methods. 
> Some ideas like improvement to existing algorithms, implementing new features especially
the above learning methods and algorithms that have not been implemented etc.
> If somebody could help me with some ideas it would really accelerate my work.
> Plus few ideas on research papers regarding Spark or Mahout.
> Thanks in advance.
> Regards 

View raw message