spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ganelin, Ilya" <Ilya.Gane...@capitalone.com>
Subject RE: Spark executor lost
Date Wed, 03 Dec 2014 23:39:33 GMT
You want to look further up the stack (there are almost certainly other errors before this
happens) and those other errors may give your better idea of what is going on. Also if you
are running on yarn you can run "yarn logs -applicationId <yourAppId>" to get the logs
from the data nodes.



Sent with Good (www.good.com)


-----Original Message-----
From: S. Zhou [myxjtu@yahoo.com.INVALID<mailto:myxjtu@yahoo.com.INVALID>]
Sent: Wednesday, December 03, 2014 06:30 PM Eastern Standard Time
To: user@spark.apache.org
Subject: Spark executor lost

We are using Spark job server to submit spark jobs (our spark version is 0.91). After running
the spark job server for a while, we often see the following errors (executor lost) in the
spark job server log. As a consequence, the spark driver (allocated inside spark job server)
gradually loses executors. And finally the spark job server no longer be able to submit jobs.
We tried to google the solutions but so far no luck. Please help if you have any ideas. Thanks!

[2014-11-25 01:37:36,250] INFO  parkDeploySchedulerBackend [] [akka://JobServer/user/context-supervisor/next-staging]
- Executor 6 disconnected, so removing it
[2014-11-25 01:37:36,252] ERROR cheduler.TaskSchedulerImpl [] [akka://JobServer/user/context-supervisor/next-staging]
- Lost executor 6 on XXXX: remote Akka client disassociated
[2014-11-25 01:37:36,252] INFO  ark.scheduler.DAGScheduler [] [] - Executor lost: 6 (epoch
8)
[2014-11-25 01:37:36,252] INFO  ge.BlockManagerMasterActor [] [] - Trying to remove executor
6 from BlockManagerMaster.
[2014-11-25 01:37:36,252] INFO  storage.BlockManagerMaster [] [] - Removed 6 successfully
in removeExecutor
[2014-11-25 01:37:36,286] INFO  ient.AppClient$ClientActor [] [akka://JobServer/user/context-supervisor/next-staging]
- Executor updated: app-20141125002023-0037/6 is now FAILED (Command exited with code 143)


________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One
and/or its affiliates. The information transmitted herewith is intended only for use by the
individual or entity to which it is addressed.  If the reader of this message is not the intended
recipient, you are hereby notified that any review, retransmission, dissemination, distribution,
copying or other use of, or taking of any action in reliance upon this information is strictly
prohibited. If you have received this communication in error, please contact the sender and
delete the material from your computer.
Mime
View raw message