spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Nguyen <>
Subject Re: Spark development for undergraduate project
Date Tue, 17 Dec 2013 19:25:51 GMT
Matt, some suggestions.

If you're interested in the machine-learning layer, perhaps you could look
into helping to harmonize our (Adatao) dataframe representation with
MLlib's, and base RDDs for that matter. It requires someone to spend some
dedicated time looking into the trade-offs between generalizability vs
performance issues, etc. It's something our groups have talked about doing
but haven't been able to invest the resources to do.

Separately, neural nets/deep learning is an area of emerging interest to
look into with Spark. It may drive some alternate optimization patterns for
Spark, e.g., sub-cluster communication. If interested, I can connect you to
some deep learning folks at UoT (not too far from you) and Google. Matei
may also have some interest in this.

Christopher T. Nguyen
Co-founder & CEO, Adatao <>

On Tue, Dec 17, 2013 at 10:43 AM, Matthew Cheah <>wrote:

> Hi everyone,
> During my most recent internship, I worked extensively with Apache Spark,
> integrating it into a company's data analytics platform. I've now become
> interested in contributing to Apache Spark.
> I'm returning to undergraduate studies in January and there is an academic
> course which is simply a standalone software engineering project. I was
> thinking that some contribution to Apache Spark would satisfy my curiosity,
> help continue support the company I interned at, and give me academic
> credits required to graduate, all at the same time. It seems like too good
> an opportunity to pass up.
> With that in mind, I have the following questions:
>    1. At this point, is there any self-contained project that I could work
>    on within Spark? Ideally, I would work on it independently, in about a
>    three month time frame. This time also needs to accommodate ramping up
> on
>    the Spark codebase and adjusting to the Scala programming language and
>    paradigms. The company I worked at primarily used the Java APIs. The
> output
>    needs to be a technical report describing the project requirements, and
> the
>    design process I took to engineer the solution for the requirements. In
>    particular, it cannot just be a series of haphazard patches.
>    2. How can I get started with contributing to Spark?
>    3. Is there a high-level UML or some other design specification for the
>    Spark architecture?
> Thanks! I hope to be of some help =)
> -Matt Cheah

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message