spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ricardo Almeida <ricardo.alme...@actnowib.com>
Subject Re: [discuss] using deep learning to improve Spark
Date Fri, 01 Apr 2016 15:05:43 GMT
Amazing! I'll fund $1/2 million for such a interesting initiative.
Oh, wait... I have only $4 on my pocket

Cheers :)

On 1 April 2016 at 11:40, Takeshi Yamamuro <linguin.m.s@gmail.com> wrote:

> Oh, the annual event...
>
> On Fri, Apr 1, 2016 at 4:37 PM, Xiao Li <gatorsmile@gmail.com> wrote:
>
>> April 1st... : )
>>
>> 2016-04-01 0:33 GMT-07:00 Michael Malak <michaelmalak@yahoo.com.invalid>:
>>
>>> I see you've been burning the midnight oil.
>>>
>>>
>>> ------------------------------
>>> *From:* Reynold Xin <rxin@databricks.com>
>>> *To:* "dev@spark.apache.org" <dev@spark.apache.org>
>>> *Sent:* Friday, April 1, 2016 1:15 AM
>>> *Subject:* [discuss] using deep learning to improve Spark
>>>
>>> Hi all,
>>>
>>> Hope you all enjoyed the Tesla 3 unveiling earlier tonight.
>>>
>>> I'd like to bring your attention to a project called DeepSpark that we
>>> have been working on for the past three years. We realized that scaling
>>> software development was challenging. A large fraction of software
>>> engineering has been manual and mundane: writing test cases, fixing bugs,
>>> implementing features according to specs, and reviewing pull requests. So
>>> we started this project to see how much we could automate.
>>>
>>> After three years of development and one year of testing, we now have
>>> enough confidence that this could work well in practice. For example, Matei
>>> confessed to me today: "It looks like DeepSpark has a better understanding
>>> of Spark internals than I ever will. It updated several pieces of code I
>>> wrote long ago that even I no longer understood.”
>>>
>>>
>>> I think it's time to discuss as a community about how we want to
>>> continue this project to ensure Spark is stable, secure, and easy to use
>>> yet able to progress as fast as possible. I'm still working on a more
>>> formal design doc, and it might take a little bit more time since I haven't
>>> been able to fully grasp DeepSpark's capabilities yet. Based on my
>>> understanding right now, I've written a blog post about DeepSpark here:
>>> https://databricks.com/blog/2016/04/01/unreasonable-effectiveness-of-deep-learning-on-spark.html
>>>
>>>
>>> Please take a look and share your thoughts. Obviously, this is an
>>> ambitious project and could take many years to fully implement. One major
>>> challenge is cost. The current Spark Jenkins infrastructure provided by the
>>> AMPLab has only 8 machines, but DeepSpark uses 12000 machines. I'm not sure
>>> whether AMPLab or Databricks can fund DeepSpark's operation for a long
>>> period of time. Perhaps AWS can help out here. Let me know if you have
>>> other ideas.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Mime
View raw message