spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Mayi <antonym...@yahoo.com.INVALID>
Subject remote Akka client disassociated - some timeout?
Date Fri, 16 Jan 2015 18:00:31 GMT
Hi,
I believe this is some kind of timeout problem but can't figure out how to increase it.
I am running spark 1.2.0 on yarn (all from cdh 5.3.0). I submit a python task which first
loads big RDD from hbase - I can see in the screen output all executors fire up then no more
logging output for next two minutes after which I get plenty of
15/01/16 17:35:16 ERROR cluster.YarnClientClusterScheduler: Lost executor 7 on node01: remote
Akka client disassociated15/01/16 17:35:16 INFO scheduler.TaskSetManager: Re-queueing tasks
for 7 from TaskSet 1.015/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 32.0 in stage
1.0 (TID 17, node01): ExecutorLostFailure (executor 7 lost)15/01/16 17:35:16 WARN scheduler.TaskSetManager:
Lost task 34.0 in stage 1.0 (TID 25, node01): ExecutorLostFailure (executor 7 lost)
this points to some timeout ~120secs while the nodes are loading the big RDD? any ideas how
to get around it?
fyi I already use following options without any success:
    spark.core.connection.ack.wait.timeout: 600    spark.akka.timeout: 1000

thanks,Antony.


Mime
View raw message