spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: [discuss] using deep learning to improve Spark
Date Fri, 01 Apr 2016 09:40:21 GMT
Oh, the annual event...

On Fri, Apr 1, 2016 at 4:37 PM, Xiao Li <gatorsmile@gmail.com> wrote:

> April 1st... : )
>
> 2016-04-01 0:33 GMT-07:00 Michael Malak <michaelmalak@yahoo.com.invalid>:
>
>> I see you've been burning the midnight oil.
>>
>>
>> ------------------------------
>> *From:* Reynold Xin <rxin@databricks.com>
>> *To:* "dev@spark.apache.org" <dev@spark.apache.org>
>> *Sent:* Friday, April 1, 2016 1:15 AM
>> *Subject:* [discuss] using deep learning to improve Spark
>>
>> Hi all,
>>
>> Hope you all enjoyed the Tesla 3 unveiling earlier tonight.
>>
>> I'd like to bring your attention to a project called DeepSpark that we
>> have been working on for the past three years. We realized that scaling
>> software development was challenging. A large fraction of software
>> engineering has been manual and mundane: writing test cases, fixing bugs,
>> implementing features according to specs, and reviewing pull requests. So
>> we started this project to see how much we could automate.
>>
>> After three years of development and one year of testing, we now have
>> enough confidence that this could work well in practice. For example, Matei
>> confessed to me today: "It looks like DeepSpark has a better understanding
>> of Spark internals than I ever will. It updated several pieces of code I
>> wrote long ago that even I no longer understood.”
>>
>>
>> I think it's time to discuss as a community about how we want to continue
>> this project to ensure Spark is stable, secure, and easy to use yet able to
>> progress as fast as possible. I'm still working on a more formal design
>> doc, and it might take a little bit more time since I haven't been able to
>> fully grasp DeepSpark's capabilities yet. Based on my understanding right
>> now, I've written a blog post about DeepSpark here:
>> https://databricks.com/blog/2016/04/01/unreasonable-effectiveness-of-deep-learning-on-spark.html
>>
>>
>> Please take a look and share your thoughts. Obviously, this is an
>> ambitious project and could take many years to fully implement. One major
>> challenge is cost. The current Spark Jenkins infrastructure provided by the
>> AMPLab has only 8 machines, but DeepSpark uses 12000 machines. I'm not sure
>> whether AMPLab or Databricks can fund DeepSpark's operation for a long
>> period of time. Perhaps AWS can help out here. Let me know if you have
>> other ideas.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>


-- 
---
Takeshi Yamamuro

Mime
View raw message