spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kant kodali <kanth...@gmail.com>
Subject spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.
Date Sun, 08 Jul 2018 13:58:19 GMT
Hi All,

I am trying to run a simple word count using YARN as a cluster manager.  I
am currently using Spark 2.3.1 and Apache hadoop 2.7.3.  When I spawn
spark-shell like below it gets stuck in ACCEPTED stated forever.

./bin/spark-shell --master yarn --deploy-mode client


I set my log4j.properties in SPARK_HOME/conf to TRACE

 queue: "default" name: "Spark shell" host: "N/A" rpc_port: -1
yarn_application_state: ACCEPTED trackingUrl: "
http://Kants-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/"
diagnostics: "" startTime: 1531056632496 finishTime: 0
final_application_status: APP_UNDEFINED app_resource_Usage {
num_used_containers: 0 num_reserved_containers: 0 used_resources { memory:
0 virtual_cores: 0 } reserved_resources { memory: 0 virtual_cores: 0 }
needed_resources { memory: 0 virtual_cores: 0 } memory_seconds: 0
vcore_seconds: 0 } originalTrackingUrl: "N/A" currentApplicationAttemptId {
application_id { id: 1 cluster_timestamp: 1531056583425 } attemptId: 1 }
progress: 0.0 applicationType: "SPARK" }}

18/07/08 06:32:22 INFO Client: Application report for
application_1531056583425_0001 (state: ACCEPTED)

18/07/08 06:32:22 DEBUG Client:

client token: N/A

diagnostics: N/A

ApplicationMaster host: N/A

ApplicationMaster RPC port: -1

queue: default

start time: 1531056632496

final status: UNDEFINED

tracking URL:
http://xxx-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/

user: xxx



18/07/08 06:32:20 DEBUG Client:

client token: N/A

diagnostics: N/A

ApplicationMaster host: N/A

ApplicationMaster RPC port: -1

queue: default

start time: 1531056632496

final status: UNDEFINED

tracking URL:
http://Kants-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/

user: kantkodali


18/07/08 06:32:21 TRACE ProtobufRpcEngine: 1: Call -> /0.0.0.0:8032:
getApplicationReport {application_id { id: 1 cluster_timestamp:
1531056583425 }}

18/07/08 06:32:21 DEBUG Client: IPC Client (1608805714) connection to /
0.0.0.0:8032 from kantkodali sending #136

18/07/08 06:32:21 DEBUG Client: IPC Client (1608805714) connection to /
0.0.0.0:8032 from kantkodali got value #136

18/07/08 06:32:21 DEBUG ProtobufRpcEngine: Call: getApplicationReport took
1ms

18/07/08 06:32:21 TRACE ProtobufRpcEngine: 1: Response <- /0.0.0.0:8032:
getApplicationReport {application_report { applicationId { id: 1
cluster_timestamp: 1531056583425 } user: "xxx" queue: "default" name:
"Spark shell" host: "N/A" rpc_port: -1 yarn_application_state: ACCEPTED
trackingUrl: "
http://xxx-MacBook-Pro-2.local:8088/proxy/application_1531056583425_0001/"
diagnostics: "" startTime: 1531056632496 finishTime: 0
final_application_status: APP_UNDEFINED app_resource_Usage {
num_used_containers: 0 num_reserved_containers: 0 used_resources { memory:
0 virtual_cores: 0 } reserved_resources { memory: 0 virtual_cores: 0 }
needed_resources { memory: 0 virtual_cores: 0 } memory_seconds: 0
vcore_seconds: 0 } originalTrackingUrl: "N/A" currentApplicationAttemptId {
application_id { id: 1 cluster_timestamp: 1531056583425 } attemptId: 1 }
progress: 0.0 applicationType: "SPARK" }}

18/07/08 06:32:21 INFO Client: Application report for
application_1531056583425_0001 (state: ACCEPTED)


I have read this link
<https://stackoverflow.com/questions/32658840/spark-shell-stuck-in-yarn-accepted-state>
and
here are the conf files that are different from default settings


*yarn-site.xml*


<configuration>


    <property>

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

    </property>


    <property>

        <name>yarn.nodemanager.resource.memory-mb</name>

        <value>16384</value>

    </property>


    <property>

       <name>yarn.scheduler.minimum-allocation-mb</name>

       <value>256</value>

    </property>


    <property>

       <name>yarn.scheduler.maximum-allocation-mb</name>

       <value>8192</value>

    </property>


   <property>

       <name>yarn.nodemanager.resource.cpu-vcores</name>

       <value>8</value>

   </property>


</configuration>

*core-site.xml*


<configuration>

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://localhost:9000</value>

    </property>

</configuration>

*hdfs-site.xml*


<configuration>

    <property>

        <name>dfs.replication</name>

        <value>1</value>

    </property>

</configuration>


you can imagine every other config remains untouched(so everything else has
default settings) Finally, I have also tried to see if there any clues in
resource manager logs but they dont seem to be helpful in terms of fixing
the issue however I am newbie to yarn so please let me know if I missed out
on something.



2018-07-08 06:54:57,345 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated
new applicationId: 1

2018-07-08 06:55:09,413 WARN
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The specific
max attempts: 0 for application: 1 is invalid, because it is out of the
range [1, 2]. Use the global max attempts instead.

2018-07-08 06:55:09,414 INFO
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application
with id 1 submitted by user xxx

2018-07-08 06:55:09,415 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing
application with id application_1531058076308_0001

2018-07-08 06:55:09,416 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=kantkodali
IP=10.0.0.58 OPERATION=Submit Application Request TARGET=ClientRMService
RESULT=SUCCESS APPID=application_1531058076308_0001

2018-07-08 06:55:09,422 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1531058076308_0001 State change from NEW to NEW_SAVING on event=
START

2018-07-08 06:55:09,422 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
Storing info for app: application_1531058076308_0001

2018-07-08 06:55:09,423 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1531058076308_0001 State change from NEW_SAVING to SUBMITTED on
event=APP_NEW_SAVED

2018-07-08 06:55:09,425 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Application added - appId: application_1531058076308_0001 user: kantkodali
leaf-queue of parent: root #applications: 1

2018-07-08 06:55:09,425 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Accepted application application_1531058076308_0001 from user: kantkodali,
in queue: default

2018-07-08 06:55:09,439 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1531058076308_0001 State change from SUBMITTED to ACCEPTED on
event=APP_ACCEPTED

2018-07-08 06:55:09,470 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Registering app attempt : appattempt_1531058076308_0001_000001

2018-07-08 06:55:09,471 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1531058076308_0001_000001 State change from NEW to SUBMITTED

2018-07-08 06:55:09,481 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
maximum-am-resource-percent is insufficient to start a single application in
queue, it is likely set too low. skipping enforcement to allow at least one
application to start

2018-07-08 06:55:09,481 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
maximum-am-resource-percent is insufficient to start a single application in
queue for user, it is likely set too low. skipping enforcement to allow at
least one application to start

2018-07-08 06:55:09,481 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Application application_1531058076308_0001 from user: xxx activated in
queue: default

2018-07-08 06:55:09,482 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Application added - appId: application_1531058076308_0001 user:
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$
User@fdd759d, leaf-queue: default #user-pending-applications: 0
#user-active-applications:
1 #queue-pending-applications: 0 #queue-active-applications: 1

2018-07-08 06:55:09,482 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Added Application Attempt appattempt_1531058076308_0001_000001 to scheduler
from user kantkodali in queue default

2018-07-08 06:55:09,484 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1531058076308_0001_000001 State change from SUBMITTED to
SCHEDULED

Any help would be great!

Thanks!

Mime
View raw message