spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Ogren <philip.og...@oracle.com>
Subject Re: Anyone know hot to submit spark job to yarn in java code?
Date Wed, 15 Jan 2014 23:35:54 GMT
My problem seems to be related to this:
https://issues.apache.org/jira/browse/MAPREDUCE-4052

So, I will try running my setup from a Linux client and see if I have 
better luck.

On 1/15/2014 11:38 AM, Philip Ogren wrote:
> Great question!  I was writing up a similar question this morning and 
> decided to investigate some more before sending.  Here's what I'm 
> trying.  I have created a new scala project that contains only 
> spark-examples-assembly-0.8.1-incubating.jar and 
> spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar on the 
> classpath and I am trying to create a yarn-client SparkContext with 
> the following:
>
> val spark = new SparkContext("yarn-client", "my-app")
>
> My hope is to run this on my laptop and have it execute/connect on the 
> yarn application master.  The hope is that if I can get this to work, 
> then I can do the same from a web application.  I'm trying to unpack 
> run-example.sh, compute-classpath, SparkPi, *.yarn.Client to figure 
> out what environment variables I need to set up etc.
>
> I grabbed all the .xml files out of my clusters conf directory (in my 
> case /etc/hadoop/conf.cloudera.yarn) such as e.g. yarn-site.xml and 
> put them on my classpath.  I also set up environment variables 
> SPARK_JAR, SPARK_YARN_APP_JAR, SPARK_YARN_USER_ENV, SPARK_HOME.
>
> When I run my simple scala script, I get the following error:
>
> Exception in thread "main" org.apache.spark.SparkException: Yarn 
> application already ended,might be killed or not able to launch 
> application master.
>     at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:95)
>     at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:72)
>     at 
> org.apache.spark.scheduler.cluster.ClusterScheduler.start(ClusterScheduler.scala:119)
>     at org.apache.spark.SparkContext.<init>(SparkContext.scala:273)
>     at 
> SparkYarnClientExperiment$.main(SparkYarnClientExperiment.scala:14)
>     at SparkYarnClientExperiment.main(SparkYarnClientExperiment.scala)
>
> I can look at my yarn UI and see that it registers a failed 
> application, so I take this as incremental progress.  However, I'm not 
> sure how to troubleshoot what I'm doing from here or if what I'm 
> trying to do is even sensible/possible.  Any advice is appreciated.
>
> Thanks,
> Philip
>
> On 1/15/2014 11:25 AM, John Zhao wrote:
>> Now I am working on a web application and I want to  submit a spark 
>> job to hadoop yarn.
>> I have already do my own assemble and  can run it in command line by 
>> the following script:
>>
>> export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn
>> export 
>> SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar
>> ./spark-class org.apache.spark.deploy.yarn.Client  --jar 
>> ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar 
>> --class org.apache.spark.examples.SparkPi --args yarn-standalone 
>> --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1
>>
>> It works fine.
>> The I realized that it is hard to submit the job from a web 
>> application .Looks like the 
>> spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or 
>> spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I 
>> believe it contains everything .
>> So my question is :
>> 1) when I run the above script, which jar is beed submitted to the 
>> yarn server ?
>> 2) It loos like the 
>> spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role 
>> of client side and spark-examples-assembly-0.8.1-incubating.jar goes 
>> with spark runtime and examples which will be running in yarn, am I 
>> right?
>> 3) Does anyone have any similar experience ? I did lots of hadoop MR 
>> stuff and want follow the same logic to submit spark job. For now I 
>> can only find the command line way to submit spark job to yarn. I 
>> believe there is a easy way to integration spark in a web allocation.
>>
>>
>> Thanks.
>> John.
>


Mime
View raw message