spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Kapilevich <matve...@gmail.com>
Subject Re: Issue running Spark 1.4 on Yarn
Date Tue, 09 Jun 2015 18:31:19 GMT
Hi Marcelo,

Thanks. I think something more subtle is happening.

I'm running a single-node cluster, so there's only 1 NM. When I executed
the exact same job the 4th time, the cluster was idle, and there was
nothing else being executed. RM currently reports that I have 6.5GB of
memory and 4 cpus available. However, the job is still stuck in the
"ACCEPTED" state a day later. Like I mentioned earlier, I'm able to execute
Hadoop jobs fine even now - this problem is specific to Spark.

Thanks,
-Matt

On Tue, Jun 9, 2015 at 12:32 PM, Marcelo Vanzin <vanzin@cloudera.com> wrote:

> If your application is stuck in that state, it generally means your
> cluster doesn't have enough resources to start it.
>
> In the RM logs you can see how many vcores / memory the application is
> asking for, and then you can check your RM configuration to see if that's
> currently available on any single NM.
>
> On Tue, Jun 9, 2015 at 7:56 AM, Matt Kapilevich <matvey14@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I'm manually building Spark from source against 1.4 branch and submitting
>> the job against Yarn. I am seeing very strange behavior. The first 2 or 3
>> times I submit the job, it runs fine, computes Pi, and exits. The next time
>> I run it, it gets stuck in the "ACCEPTED" state.
>>
>> I'm kicking off a job using "yarn-client" mode like this:
>>
>> ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
>> yarn-client  --num-executors 3    --driver-memory 4g     --executor-memory
>> 2g    --executor-cores 1    --queue thequeue
>> examples/target/scala-2.10/spark-examples*.jar    10
>>
>> Here's what ResourceManager shows:[image: Yarn ResourceManager UI]
>>
>> In Yarn ResourceManager logs, all I'm seeing is this:
>>
>> 2015-06-08 14:49:57,166 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
>> Added Application Attempt appattempt_1433789077942_0004_000001 to scheduler
>> from user: root
>> 2015-06-08 14:49:57,166 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>> appattempt_1433789077942_0004_000001 State change from SUBMITTED to
>> SCHEDULED
>>
>> There's nothing in the NodeManager logs (though its up and running), the
>> job isn't getting that far.
>>
>> It seems to me that there's an issue somewhere between Spark 1.4 and Yarn
>> integration. Hadoop runs without any issues. I've ran the below multiple
>> times.
>>
>> yarn jar
>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.4.2.jar pi
>> 16 100
>>
>> For reference, I'm compiling the source against 1.4 branch, and running
>> it on a single-node cluster with CDH5.4 and Hadoop 2.6, distributed mode. I
>> am using the following to compile: "mvn -Phadoop-2.6 -Dhadoop.version=2.6.0
>> -Pyarn -Phive -Phive-thriftserver -DskipTests clean package"
>>
>> Any help appreciated.
>>
>> Thanks,
>> -Matt
>>
>
>
>
> --
> Marcelo
>

Mime
View raw message