hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xun Liu <neliu...@163.com>
Subject Re: [DISCUSS] Making submarine to different release model like Ozone
Date Fri, 01 Feb 2019 02:39:16 GMT
+1

Hello everyone, 

I am Xun Liu, the head of the machine learning team at Netease Research Institute. I quite
agree with Wangda.

Our team is very grateful for getting Submarine machine learning engine from the community.
 
We are heavy users of Submarine. 
Because Submarine fits into the direction of our big data team's hadoop technology stack,

It avoids the needs to increase the manpower investment in learning other container scheduling
systems. 
The important thing is that we can use a common YARN cluster to run machine learning, 
which makes the utilization of server resources more efficient, and reserves a lot of human
and material resources in our previous years.

Our team have finished the test and deployment of the Submarine and will provide the service
to our e-commerce department (http://www.kaola.com/) shortly.

We also plan to provides the Submarine engine in our existing YARN cluster in the next six
months. 
Because we have a lot of product departments need to use machine learning services, 
for example: 
1) Game department (http://game.163.com/) needs AI battle training, 
2) News department (http://www.163.com) needs news recommendation,
3) Mailbox department (http://www.163.com) requires anti-spam and illegal detection, 
4) Music department (https://music.163.com/) requires music recommendation, 
5) Education department (http://www.youdao.com) requires voice recognition, 
6) Massive Open Online Courses (https://open.163.com/) requires multilingual translation and
so on.

If Submarine can be released independently like Ozone, it will help us quickly get the latest
features and improvements, and it will be great helpful to our team and users.

Thanks hadoop Community!


> 在 2019年2月1日,上午2:53,Wangda Tan <wheeleast@gmail.com> 写道:
> 
> Hi devs,
> 
> Since we started submarine-related effort last year, we received a lot of
> feedbacks, several companies (such as Netease, China Mobile, etc.)  are
> trying to deploy Submarine to their Hadoop cluster along with big data
> workloads. Linkedin also has big interests to contribute a Submarine TonY (
> https://github.com/linkedin/TonY) runtime to allow users to use the same
> interface.
> 
> From what I can see, there're several issues of putting Submarine under
> yarn-applications directory and have same release cycle with Hadoop:
> 
> 1) We started 3.2.0 release at Sep 2018, but the release is done at Jan
> 2019. Because of non-predictable blockers and security issues, it got
> delayed a lot. We need to iterate submarine fast at this point.
> 
> 2) We also see a lot of requirements to use Submarine on older Hadoop
> releases such as 2.x. Many companies may not upgrade Hadoop to 3.x in a
> short time, but the requirement to run deep learning is urgent to them. We
> should decouple Submarine from Hadoop version.
> 
> And why we wanna to keep it within Hadoop? First, Submarine included some
> innovation parts such as enhancements of user experiences for YARN
> services/containerization support which we can add it back to Hadoop later
> to address common requirements. In addition to that, we have a big overlap
> in the community developing and using it.
> 
> There're several proposals we have went through during Ozone merge to trunk
> discussion:
> https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201803.mbox/%3CCAHfHakH6_m3YLdf5a2KQ8+w-5fbVX5aHFgS-x1VaJW8gmnzRLg@mail.gmail.com%3E
> 
> I propose to adopt Ozone model: which is the same master branch, different
> release cycle, and different release branch. It is a great example to show
> agile release we can do (2 Ozone releases after Oct 2018) with less
> overhead to setup CI, projects, etc.
> 
> *Links:*
> - JIRA: https://issues.apache.org/jira/browse/YARN-8135
> - Design doc
> <https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit>
> - User doc
> <https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/Index.html>
> (3.2.0
> release)
> - Blogposts, {Submarine} : Running deep learning workloads on Apache Hadoop
> <https://hortonworks.com/blog/submarine-running-deep-learning-workloads-apache-hadoop/>,
> (Chinese Translation: Link <https://www.jishuwen.com/d/2Vpu>)
> - Talks: Strata Data Conf NY
> <https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/68289>
> 
> Thoughts?
> 
> Thanks,
> Wangda Tan



---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Mime
View raw message