spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu, Raymond" <>
Subject Question about Spark on Yarn design
Date Thu, 22 Aug 2013 05:01:42 GMT

	In the current implementation, when running Spark on top of Yarn by spark.deploy.yarn.Client,
the Spark context is actually running in the Yarn Application Master process as a thread.
Say not on the node that Client app is invoke. Which is quite different from the other mode
say mesos/standalone/local etc.
	I understand that this is not a problem for app that do not involve user interaction or local
file operation. While not working for e.g. spark-shell like app. Say

	And with this yarn mode, the app output is on the node that AM is instanced. not easy to
be checked.

	So, any particular reason that yarn mode is implemented this way?

	And I just wondering, could we add a mode that the scheduler's Driver actor is still run
on local machine, and the Yarn Application Master just launch Executor instead of run the
whole user application. Thus a client wrapper app is not needed, and user can just new Spark
Context with this mode in local. The drawback I can image is that the client node's burden
is heavier compare to current Yarn mode, but similar to mesos or standalone mode. And might
also need to handle some fail over issue etc.

	Another approaching might be that, enhance the current wrapper client application to proxy
all input and output from the app back to the local machine instead of just report the application
status. This seems to me not a sound or easy solution, Especially when local file is involved
or third party library which have local operation involved.

	Any ideas?

Best Regards,
Raymond Liu

View raw message