spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: remote Akka client disassociated - some timeout?
Date Sat, 17 Jan 2015 11:21:19 GMT
Antony:
Please check hbase master log to see if there was something noticeable in that period of time.

If the hbase cluster is not big, check region server log as well. 

Cheers



> On Jan 16, 2015, at 10:00 AM, Antony Mayi <antonymayi@yahoo.com.INVALID> wrote:
> 
> Hi,
> 
> I believe this is some kind of timeout problem but can't figure out how to increase it.
> 
> I am running spark 1.2.0 on yarn (all from cdh 5.3.0). I submit a python task which first
loads big RDD from hbase - I can see in the screen output all executors fire up then no more
logging output for next two minutes after which I get plenty of
> 
> 15/01/16 17:35:16 ERROR cluster.YarnClientClusterScheduler: Lost executor 7 on node01:
remote Akka client disassociated
> 15/01/16 17:35:16 INFO scheduler.TaskSetManager: Re-queueing tasks for 7 from TaskSet
1.0
> 15/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 32.0 in stage 1.0 (TID 17,
node01): ExecutorLostFailure (executor 7 lost)
> 15/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 34.0 in stage 1.0 (TID 25,
node01): ExecutorLostFailure (executor 7 lost)
> 
> this points to some timeout ~120secs while the nodes are loading the big RDD? any ideas
how to get around it?
> 
> fyi I already use following options without any success:
> 
>     spark.core.connection.ack.wait.timeout: 600
>     spark.akka.timeout: 1000
> 
> 
> thanks,
> Antony.
> 
> 

Mime
View raw message