spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco Mistroni <mmistr...@gmail.com>
Subject Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.
Date Sun, 08 Jul 2018 14:19:50 GMT
You running on emr? You checked the emr logs?
Was in similar situation where job was stuck in accepted and then it
died..turned out to be an issue w. My code when running g with huge
data.perhaps try to reduce gradually the load til it works and then start
from there?
Not a huge help but I followed same when. My job was stuck on accepted
Hth

On Sun, Jul 8, 2018, 2:59 PM kant kodali <kanth909@gmail.com> wrote:

> Hi All,
>
> I am trying to run a simple word count using YARN as a cluster manager.  I
> am currently using Spark 2.3.1 and Apache hadoop 2.7.3.  When I spawn
> spark-shell like below it gets stuck in ACCEPTED stated forever.
>
> ./bin/spark-shell --master yarn --deploy-mode client
>
>
> I set my log4j.properties in SPARK_HOME/conf to TRACE
>
>  queue: "default" name: "Spark shell" host: "N/A" rpc_port: -1
> yarn_application_state: ACCEPTED trackingUrl: "
> http://Kants-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/"
> diagnostics: "" startTime: 1531056632496 finishTime: 0
> final_application_status: APP_UNDEFINED app_resource_Usage {
> num_used_containers: 0 num_reserved_containers: 0 used_resources { memory:
> 0 virtual_cores: 0 } reserved_resources { memory: 0 virtual_cores: 0 }
> needed_resources { memory: 0 virtual_cores: 0 } memory_seconds: 0
> vcore_seconds: 0 } originalTrackingUrl: "N/A" currentApplicationAttemptId {
> application_id { id: 1 cluster_timestamp: 1531056583425 } attemptId: 1 }
> progress: 0.0 applicationType: "SPARK" }}
>
> 18/07/08 06:32:22 INFO Client: Application report for
> application_1531056583425_0001 (state: ACCEPTED)
>
> 18/07/08 06:32:22 DEBUG Client:
>
> client token: N/A
>
> diagnostics: N/A
>
> ApplicationMaster host: N/A
>
> ApplicationMaster RPC port: -1
>
> queue: default
>
> start time: 1531056632496
>
> final status: UNDEFINED
>
> tracking URL:
> http://xxx-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/
>
> user: xxx
>
>
>
> 18/07/08 06:32:20 DEBUG Client:
>
> client token: N/A
>
> diagnostics: N/A
>
> ApplicationMaster host: N/A
>
> ApplicationMaster RPC port: -1
>
> queue: default
>
> start time: 1531056632496
>
> final status: UNDEFINED
>
> tracking URL:
> http://Kants-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/
>
> user: kantkodali
>
>
> 18/07/08 06:32:21 TRACE ProtobufRpcEngine: 1: Call -> /0.0.0.0:8032:
> getApplicationReport {application_id { id: 1 cluster_timestamp:
> 1531056583425 }}
>
> 18/07/08 06:32:21 DEBUG Client: IPC Client (1608805714) connection to /
> 0.0.0.0:8032 from kantkodali sending #136
>
> 18/07/08 06:32:21 DEBUG Client: IPC Client (1608805714) connection to /
> 0.0.0.0:8032 from kantkodali got value #136
>
> 18/07/08 06:32:21 DEBUG ProtobufRpcEngine: Call: getApplicationReport took
> 1ms
>
> 18/07/08 06:32:21 TRACE ProtobufRpcEngine: 1: Response <- /0.0.0.0:8032:
> getApplicationReport {application_report { applicationId { id: 1
> cluster_timestamp: 1531056583425 } user: "xxx" queue: "default" name:
> "Spark shell" host: "N/A" rpc_port: -1 yarn_application_state: ACCEPTED
> trackingUrl: "
> http://xxx-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/"
> diagnostics: "" startTime: 1531056632496 finishTime: 0
> final_application_status: APP_UNDEFINED app_resource_Usage {
> num_used_containers: 0 num_reserved_containers: 0 used_resources { memory:
> 0 virtual_cores: 0 } reserved_resources { memory: 0 virtual_cores: 0 }
> needed_resources { memory: 0 virtual_cores: 0 } memory_seconds: 0
> vcore_seconds: 0 } originalTrackingUrl: "N/A" currentApplicationAttemptId {
> application_id { id: 1 cluster_timestamp: 1531056583425 } attemptId: 1 }
> progress: 0.0 applicationType: "SPARK" }}
>
> 18/07/08 06:32:21 INFO Client: Application report for
> application_1531056583425_0001 (state: ACCEPTED)
>
>
> I have read this link
> <https://stackoverflow.com/questions/32658840/spark-shell-stuck-in-yarn-accepted-state>
and
> here are the conf files that are different from default settings
>
>
> *yarn-site.xml*
>
>
> <configuration>
>
>
>     <property>
>
>         <name>yarn.nodemanager.aux-services</name>
>
>         <value>mapreduce_shuffle</value>
>
>     </property>
>
>
>     <property>
>
>         <name>yarn.nodemanager.resource.memory-mb</name>
>
>         <value>16384</value>
>
>     </property>
>
>
>     <property>
>
>        <name>yarn.scheduler.minimum-allocation-mb</name>
>
>        <value>256</value>
>
>     </property>
>
>
>     <property>
>
>        <name>yarn.scheduler.maximum-allocation-mb</name>
>
>        <value>8192</value>
>
>     </property>
>
>
>    <property>
>
>        <name>yarn.nodemanager.resource.cpu-vcores</name>
>
>        <value>8</value>
>
>    </property>
>
>
> </configuration>
>
> *core-site.xml*
>
>
> <configuration>
>
>     <property>
>
>         <name>fs.defaultFS</name>
>
>         <value>hdfs://localhost:9000</value>
>
>     </property>
>
> </configuration>
>
> *hdfs-site.xml*
>
>
> <configuration>
>
>     <property>
>
>         <name>dfs.replication</name>
>
>         <value>1</value>
>
>     </property>
>
> </configuration>
>
>
> you can imagine every other config remains untouched(so everything else
> has default settings) Finally, I have also tried to see if there any clues
> in resource manager logs but they dont seem to be helpful in terms of
> fixing the issue however I am newbie to yarn so please let me know if I
> missed out on something.
>
>
>
> 2018-07-08 06:54:57,345 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated
> new applicationId: 1
>
> 2018-07-08 06:55:09,413 WARN
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The specific
> max attempts: 0 for application: 1 is invalid, because it is out of the
> range [1, 2]. Use the global max attempts instead.
>
> 2018-07-08 06:55:09,414 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application
> with id 1 submitted by user xxx
>
> 2018-07-08 06:55:09,415 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing
> application with id application_1531058076308_0001
>
> 2018-07-08 06:55:09,416 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER
> =kantkodali IP=10.0.0.58 OPERATION=Submit Application Request
> TARGET=ClientRMService RESULT=SUCCESS APPID=application_1531058076308_0001
>
> 2018-07-08 06:55:09,422 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1531058076308_0001 State change from NEW to NEW_SAVING on
> event=START
>
> 2018-07-08 06:55:09,422 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Storing info for app: application_1531058076308_0001
>
> 2018-07-08 06:55:09,423 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1531058076308_0001 State change from NEW_SAVING to SUBMITTED
> on event=APP_NEW_SAVED
>
> 2018-07-08 06:55:09,425 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> Application added - appId: application_1531058076308_0001 user:
> kantkodali leaf-queue of parent: root #applications: 1
>
> 2018-07-08 06:55:09,425 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Accepted application application_1531058076308_0001 from user:
> kantkodali, in queue: default
>
> 2018-07-08 06:55:09,439 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1531058076308_0001 State change from SUBMITTED to ACCEPTED on
> event=APP_ACCEPTED
>
> 2018-07-08 06:55:09,470 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
> Registering app attempt : appattempt_1531058076308_0001_000001
>
> 2018-07-08 06:55:09,471 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1531058076308_0001_000001 State change from NEW to SUBMITTED
>
> 2018-07-08 06:55:09,481 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> maximum-am-resource-percent is insufficient to start a single application
> in queue, it is likely set too low. skipping enforcement to allow at least
> one application to start
>
> 2018-07-08 06:55:09,481 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> maximum-am-resource-percent is insufficient to start a single application
> in queue for user, it is likely set too low. skipping enforcement to allow
> at least one application to start
>
> 2018-07-08 06:55:09,481 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Application application_1531058076308_0001 from user: xxx activated in
> queue: default
>
> 2018-07-08 06:55:09,482 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Application added - appId: application_1531058076308_0001 user:
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$
> User@fdd759d, leaf-queue: default #user-pending-applications: 0 #user-active-applications:
> 1 #queue-pending-applications: 0 #queue-active-applications: 1
>
> 2018-07-08 06:55:09,482 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Added Application Attempt appattempt_1531058076308_0001_000001 to
> scheduler from user kantkodali in queue default
>
> 2018-07-08 06:55:09,484 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1531058076308_0001_000001 State change from SUBMITTED to
> SCHEDULED
>
> Any help would be great!
>
> Thanks!
>

Mime
View raw message