spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chester @work" <ches...@alpinenow.com>
Subject Re: Running Spark On Yarn without Spark-Submit
Date Fri, 29 Aug 2014 13:58:57 GMT
Archit
     We are using yarn-cluster mode , and calling spark via Client class directly from servlet
server. It works fine. 
    To establish a communication channel to give further requests, 
     It should be possible with yarn client, but not with yarn server. Yarn client mode, spark
driver is outside the yarn cluster; so it can issue more commands. In yarn cluster, all programs
including spark driver is running inside the yarn cluster. There is no communication channel
with the client until the job finishes.

If you job is to keep spark context alive, and wait for other commands, then this should wait
forever. 

I am actually working on some improvements on this and experiment in our product, I will create
PRs when I feel conformable with the solution

1) change Client API to allow the caller to know yarn app resource capacity before passing
arguments
2) add YarnApplicationListener to the Client 
3) provide communication channel between application and spark Yarn client in cluster. 

The #1) is not directly related to the communication discussed here

#2) allows the application to have application life cycle call back as to app start end in
progress failure etc with yarn resources allocations 

I changed #1 and #2 in forked spark, and it's worked well in cdh5, and I am testing against
2.0.5-alpha as well. 

For #3) I did not change in spark currently, as I am not sure the best approach yet. I put
the change in the application runner which launch the spark yarn client in the cluster. 

The runner in yarn cluster get applications host and port information  from the passed configuration
(args), then creates an Akka actor using spark context actor system, send a hand shake message
to the caller outside the cluster, after that you will have a two way communications 

With this approach, I can send spark listener call backs to the app, error messages, app level
messages etc. 

The runner inside the cluster can also receive requests from outside cluster such as stop.


We are not sure Akka approach is the best, so I am still experimenting it. So far it does
what we wants .

Hope this helps

Chester


Sent from my iPhone

> On Aug 29, 2014, at 2:36 AM, Archit Thakur <archit279thakur@gmail.com> wrote:
> 
> including user@spark.apache.org.
> 
> 
>> On Fri, Aug 29, 2014 at 2:03 PM, Archit Thakur <archit279thakur@gmail.com>
wrote:
>> Hi,
>> 
>> My requirement is to run Spark on Yarn without using the script spark-submit.
>> 
>> I have a servlet and a tomcat server. As and when request comes, it creates a new
SC and keeps it alive for the further requests, I ma setting my master in sparkConf
>> 
>> as sparkConf.setMaster("yarn-cluster")
>> 
>> but the request is stuck indefinitely. 
>> 
>> This works when I set
>> sparkConf.setMaster("yarn-client")
>> 
>> I am not sure, why is it not launching job in yarn-cluster mode.
>> 
>> Any thoughts?
>> 
>> Thanks and Regards,
>> Archit Thakur. 
> 

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message