spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiao Li <gatorsm...@gmail.com>
Subject Re: [discuss] using deep learning to improve Spark
Date Fri, 01 Apr 2016 07:37:27 GMT
April 1st... : )

2016-04-01 0:33 GMT-07:00 Michael Malak <michaelmalak@yahoo.com.invalid>:

> I see you've been burning the midnight oil.
>
>
> ------------------------------
> *From:* Reynold Xin <rxin@databricks.com>
> *To:* "dev@spark.apache.org" <dev@spark.apache.org>
> *Sent:* Friday, April 1, 2016 1:15 AM
> *Subject:* [discuss] using deep learning to improve Spark
>
> Hi all,
>
> Hope you all enjoyed the Tesla 3 unveiling earlier tonight.
>
> I'd like to bring your attention to a project called DeepSpark that we
> have been working on for the past three years. We realized that scaling
> software development was challenging. A large fraction of software
> engineering has been manual and mundane: writing test cases, fixing bugs,
> implementing features according to specs, and reviewing pull requests. So
> we started this project to see how much we could automate.
>
> After three years of development and one year of testing, we now have
> enough confidence that this could work well in practice. For example, Matei
> confessed to me today: "It looks like DeepSpark has a better understanding
> of Spark internals than I ever will. It updated several pieces of code I
> wrote long ago that even I no longer understood.”
>
>
> I think it's time to discuss as a community about how we want to continue
> this project to ensure Spark is stable, secure, and easy to use yet able to
> progress as fast as possible. I'm still working on a more formal design
> doc, and it might take a little bit more time since I haven't been able to
> fully grasp DeepSpark's capabilities yet. Based on my understanding right
> now, I've written a blog post about DeepSpark here:
> https://databricks.com/blog/2016/04/01/unreasonable-effectiveness-of-deep-learning-on-spark.html
>
>
> Please take a look and share your thoughts. Obviously, this is an
> ambitious project and could take many years to fully implement. One major
> challenge is cost. The current Spark Jenkins infrastructure provided by the
> AMPLab has only 8 machines, but DeepSpark uses 12000 machines. I'm not sure
> whether AMPLab or Databricks can fund DeepSpark's operation for a long
> period of time. Perhaps AWS can help out here. Let me know if you have
> other ideas.
>
>
>
>
>
>
>
>
>

Mime
View raw message