spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu, Raymond" <raymond....@intel.com>
Subject RE: Question about Spark on Yarn design
Date Wed, 28 Aug 2013 07:19:32 GMT
Hi

	Just send a pull request at https://github.com/mesos/spark/pull/868/ for enable spark-shell
upon yarn ( https://spark-project.atlassian.net/browse/SPARK-527 ) according to idea I mentioned
previously : "add a mode that the scheduler's Driver actor is still run on local machine,
and the Yarn Application Master just launch Executor instead of run the whole user application.
Thus a client wrapper app is not needed, and user can just new Spark Context with this mode
in local"


Best Regards,
Raymond Liu

-----Original Message-----
From: Liu, Raymond [mailto:raymond.liu@intel.com] 
Sent: Thursday, August 22, 2013 1:02 PM
To: dev@spark.incubator.apache.org
Subject: Question about Spark on Yarn design

Hi

	In the current implementation, when running Spark on top of Yarn by spark.deploy.yarn.Client,
the Spark context is actually running in the Yarn Application Master process as a thread.
Say not on the node that Client app is invoke. Which is quite different from the other mode
say mesos/standalone/local etc.
	I understand that this is not a problem for app that do not involve user interaction or local
file operation. While not working for e.g. spark-shell like app. Say https://spark-project.atlassian.net/browse/SPARK-527

	And with this yarn mode, the app output is on the node that AM is instanced. not easy to
be checked.

	So, any particular reason that yarn mode is implemented this way?

	And I just wondering, could we add a mode that the scheduler's Driver actor is still run
on local machine, and the Yarn Application Master just launch Executor instead of run the
whole user application. Thus a client wrapper app is not needed, and user can just new Spark
Context with this mode in local. The drawback I can image is that the client node's burden
is heavier compare to current Yarn mode, but similar to mesos or standalone mode. And might
also need to handle some fail over issue etc.

	Another approaching might be that, enhance the current wrapper client application to proxy
all input and output from the app back to the local machine instead of just report the application
status. This seems to me not a sound or easy solution, Especially when local file is involved
or third party library which have local operation involved.

	Any ideas?

Best Regards,
Raymond Liu


Mime
View raw message