spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ☼ R Nair (रविशंकर नायर) <ravishankar.n...@gmail.com>
Subject Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.
Date Sun, 08 Jul 2018 14:47:31 GMT
Are you able to run a simple Map Reduce job on yarn without any issues?

If you have any issues: I had this problem on Mac. Use CSRUTIL in Mac, to
disable it. Then add a softlink

sudo ln –s  /usr/bin/java    /bin/java


The new versions of Mac from EL Captain does not allow softlinks in
/bin/java.


I got everything working by above.


Best,

Ravion



On Sun, Jul 8, 2018 at 10:20 AM Marco Mistroni <mmistroni@gmail.com> wrote:

> You running on emr? You checked the emr logs?
> Was in similar situation where job was stuck in accepted and then it
> died..turned out to be an issue w. My code when running g with huge
> data.perhaps try to reduce gradually the load til it works and then start
> from there?
> Not a huge help but I followed same when. My job was stuck on accepted
> Hth
>
> On Sun, Jul 8, 2018, 2:59 PM kant kodali <kanth909@gmail.com> wrote:
>
>> Hi All,
>>
>> I am trying to run a simple word count using YARN as a cluster manager.
>> I am currently using Spark 2.3.1 and Apache hadoop 2.7.3.  When I spawn
>> spark-shell like below it gets stuck in ACCEPTED stated forever.
>>
>> ./bin/spark-shell --master yarn --deploy-mode client
>>
>>
>> I set my log4j.properties in SPARK_HOME/conf to TRACE
>>
>>  queue: "default" name: "Spark shell" host: "N/A" rpc_port: -1
>> yarn_application_state: ACCEPTED trackingUrl: "
>> http://Kants-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/"
>> diagnostics: "" startTime: 1531056632496 finishTime: 0
>> final_application_status: APP_UNDEFINED app_resource_Usage {
>> num_used_containers: 0 num_reserved_containers: 0 used_resources { memory:
>> 0 virtual_cores: 0 } reserved_resources { memory: 0 virtual_cores: 0 }
>> needed_resources { memory: 0 virtual_cores: 0 } memory_seconds: 0
>> vcore_seconds: 0 } originalTrackingUrl: "N/A" currentApplicationAttemptId {
>> application_id { id: 1 cluster_timestamp: 1531056583425 } attemptId: 1 }
>> progress: 0.0 applicationType: "SPARK" }}
>>
>> 18/07/08 06:32:22 INFO Client: Application report for
>> application_1531056583425_0001 (state: ACCEPTED)
>>
>> 18/07/08 06:32:22 DEBUG Client:
>>
>> client token: N/A
>>
>> diagnostics: N/A
>>
>> ApplicationMaster host: N/A
>>
>> ApplicationMaster RPC port: -1
>>
>> queue: default
>>
>> start time: 1531056632496
>>
>> final status: UNDEFINED
>>
>> tracking URL:
>> http://xxx-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/
>>
>> user: xxx
>>
>>
>>
>> 18/07/08 06:32:20 DEBUG Client:
>>
>> client token: N/A
>>
>> diagnostics: N/A
>>
>> ApplicationMaster host: N/A
>>
>> ApplicationMaster RPC port: -1
>>
>> queue: default
>>
>> start time: 1531056632496
>>
>> final status: UNDEFINED
>>
>> tracking URL:
>> http://Kants-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/
>>
>> user: kantkodali
>>
>>
>> 18/07/08 06:32:21 TRACE ProtobufRpcEngine: 1: Call -> /0.0.0.0:8032:
>> getApplicationReport {application_id { id: 1 cluster_timestamp:
>> 1531056583425 }}
>>
>> 18/07/08 06:32:21 DEBUG Client: IPC Client (1608805714) connection to /
>> 0.0.0.0:8032 from kantkodali sending #136
>>
>> 18/07/08 06:32:21 DEBUG Client: IPC Client (1608805714) connection to /
>> 0.0.0.0:8032 from kantkodali got value #136
>>
>> 18/07/08 06:32:21 DEBUG ProtobufRpcEngine: Call: getApplicationReport
>> took 1ms
>>
>> 18/07/08 06:32:21 TRACE ProtobufRpcEngine: 1: Response <- /0.0.0.0:8032:
>> getApplicationReport {application_report { applicationId { id: 1
>> cluster_timestamp: 1531056583425 } user: "xxx" queue: "default" name:
>> "Spark shell" host: "N/A" rpc_port: -1 yarn_application_state: ACCEPTED
>> trackingUrl: "
>> http://xxx-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/"
>> diagnostics: "" startTime: 1531056632496 finishTime: 0
>> final_application_status: APP_UNDEFINED app_resource_Usage {
>> num_used_containers: 0 num_reserved_containers: 0 used_resources { memory:
>> 0 virtual_cores: 0 } reserved_resources { memory: 0 virtual_cores: 0 }
>> needed_resources { memory: 0 virtual_cores: 0 } memory_seconds: 0
>> vcore_seconds: 0 } originalTrackingUrl: "N/A" currentApplicationAttemptId {
>> application_id { id: 1 cluster_timestamp: 1531056583425 } attemptId: 1 }
>> progress: 0.0 applicationType: "SPARK" }}
>>
>> 18/07/08 06:32:21 INFO Client: Application report for
>> application_1531056583425_0001 (state: ACCEPTED)
>>
>>
>> I have read this link
>> <https://stackoverflow.com/questions/32658840/spark-shell-stuck-in-yarn-accepted-state>
and
>> here are the conf files that are different from default settings
>>
>>
>> *yarn-site.xml*
>>
>>
>> <configuration>
>>
>>
>>     <property>
>>
>>         <name>yarn.nodemanager.aux-services</name>
>>
>>         <value>mapreduce_shuffle</value>
>>
>>     </property>
>>
>>
>>     <property>
>>
>>         <name>yarn.nodemanager.resource.memory-mb</name>
>>
>>         <value>16384</value>
>>
>>     </property>
>>
>>
>>     <property>
>>
>>        <name>yarn.scheduler.minimum-allocation-mb</name>
>>
>>        <value>256</value>
>>
>>     </property>
>>
>>
>>     <property>
>>
>>        <name>yarn.scheduler.maximum-allocation-mb</name>
>>
>>        <value>8192</value>
>>
>>     </property>
>>
>>
>>    <property>
>>
>>        <name>yarn.nodemanager.resource.cpu-vcores</name>
>>
>>        <value>8</value>
>>
>>    </property>
>>
>>
>> </configuration>
>>
>> *core-site.xml*
>>
>>
>> <configuration>
>>
>>     <property>
>>
>>         <name>fs.defaultFS</name>
>>
>>         <value>hdfs://localhost:9000</value>
>>
>>     </property>
>>
>> </configuration>
>>
>> *hdfs-site.xml*
>>
>>
>> <configuration>
>>
>>     <property>
>>
>>         <name>dfs.replication</name>
>>
>>         <value>1</value>
>>
>>     </property>
>>
>> </configuration>
>>
>>
>> you can imagine every other config remains untouched(so everything else
>> has default settings) Finally, I have also tried to see if there any clues
>> in resource manager logs but they dont seem to be helpful in terms of
>> fixing the issue however I am newbie to yarn so please let me know if I
>> missed out on something.
>>
>>
>>
>> 2018-07-08 06:54:57,345 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated
>> new applicationId: 1
>>
>> 2018-07-08 06:55:09,413 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The specific
>> max attempts: 0 for application: 1 is invalid, because it is out of the
>> range [1, 2]. Use the global max attempts instead.
>>
>> 2018-07-08 06:55:09,414 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application
>> with id 1 submitted by user xxx
>>
>> 2018-07-08 06:55:09,415 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing
>> application with id application_1531058076308_0001
>>
>> 2018-07-08 06:55:09,416 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER
>> =kantkodali IP=10.0.0.58 OPERATION=Submit Application Request
>> TARGET=ClientRMService RESULT=SUCCESS
>> APPID=application_1531058076308_0001
>>
>> 2018-07-08 06:55:09,422 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
>> application_1531058076308_0001 State change from NEW to NEW_SAVING on
>> event=START
>>
>> 2018-07-08 06:55:09,422 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
>> Storing info for app: application_1531058076308_0001
>>
>> 2018-07-08 06:55:09,423 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
>> application_1531058076308_0001 State change from NEW_SAVING to SUBMITTED
>> on event=APP_NEW_SAVED
>>
>> 2018-07-08 06:55:09,425 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
>> Application added - appId: application_1531058076308_0001 user:
>> kantkodali leaf-queue of parent: root #applications: 1
>>
>> 2018-07-08 06:55:09,425 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>> Accepted application application_1531058076308_0001 from user:
>> kantkodali, in queue: default
>>
>> 2018-07-08 06:55:09,439 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
>> application_1531058076308_0001 State change from SUBMITTED to ACCEPTED on
>> event=APP_ACCEPTED
>>
>> 2018-07-08 06:55:09,470 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
>> Registering app attempt : appattempt_1531058076308_0001_000001
>>
>> 2018-07-08 06:55:09,471 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>> appattempt_1531058076308_0001_000001 State change from NEW to SUBMITTED
>>
>> 2018-07-08 06:55:09,481 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> maximum-am-resource-percent is insufficient to start a single
>> application in queue, it is likely set too low. skipping enforcement to
>> allow at least one application to start
>>
>> 2018-07-08 06:55:09,481 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> maximum-am-resource-percent is insufficient to start a single
>> application in queue for user, it is likely set too low. skipping
>> enforcement to allow at least one application to start
>>
>> 2018-07-08 06:55:09,481 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Application application_1531058076308_0001 from user: xxx activated in
>> queue: default
>>
>> 2018-07-08 06:55:09,482 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Application added - appId: application_1531058076308_0001 user:
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$
>> User@fdd759d, leaf-queue: default #user-pending-applications: 0 #user-active-applications:
>> 1 #queue-pending-applications: 0 #queue-active-applications: 1
>>
>> 2018-07-08 06:55:09,482 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>> Added Application Attempt appattempt_1531058076308_0001_000001 to
>> scheduler from user kantkodali in queue default
>>
>> 2018-07-08 06:55:09,484 INFO
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>> appattempt_1531058076308_0001_000001 State change from SUBMITTED to
>> SCHEDULED
>>
>> Any help would be great!
>>
>> Thanks!
>>
>

Mime
View raw message