spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <jan.zi...@centrum.cz>
Subject Spark on YARN, ExecutorLostFailure for long running computations in map
Date Sat, 08 Nov 2014 09:28:58 GMT
Hi,

I am getting ExecutorLostFailure when I run spark on YARN and in map I perform very long
tasks (couple of hours). Error Log is below.

Do you know if it is possible to set something to make it possible for Spark to perform these
very long running jobs in map?

Thank you very much for any advice.

Best regards,
Jan 
 
Spark log:
4533,931: [GC 394578K->20882K(1472000K), 0,0226470 secs]
Traceback (most recent call last):
  File "/home/hadoop/spark_stuff/spark_lda.py", line 112, in <module>
    models.saveAsTextFile(sys.argv[1])
  File "/home/hadoop/spark/python/pyspark/rdd.py", line 1324, in saveAsTextFile
    keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path)
  File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538,
in __call__
  File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in
get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o36.saveAsTextFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 28 in stage 0.0
failed 4 times, most recent failure: Lost task 28.3 in stage 0.0 (TID 41, ip-172-16-1-90.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
 
 
Yarn log:
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-152.us-west-2.compute.internal:41091
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-152.us-west-2.compute.internal:39160
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-152.us-west-2.compute.internal:45058
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-241.us-west-2.compute.internal:54111
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-238.us-west-2.compute.internal:45772
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-241.us-west-2.compute.internal:59509
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-238.us-west-2.compute.internal:35720
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:21:11 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509)
14/11/08 08:21:11 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509)
14/11/08 08:21:11 ERROR network.ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509)
not found
14/11/08 08:21:11 INFO cluster.YarnClientSchedulerBackend: Executor 10 disconnected, so removing
it
14/11/08 08:21:11 ERROR cluster.YarnClientClusterScheduler: Lost executor 10 on ip-172-16-1-241.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:21:11 INFO scheduler.TaskSetManager: Re-queueing tasks for 10 from TaskSet 0.0
14/11/08 08:21:11 WARN scheduler.TaskSetManager: Lost task 28.0 in stage 0.0 (TID 28, ip-172-16-1-241.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:21:11 INFO scheduler.DAGScheduler: Executor lost: 10 (epoch 0)
14/11/08 08:21:11 INFO storage.BlockManagerMasterActor: Trying to remove executor 10 from
BlockManagerMaster.
14/11/08 08:21:11 INFO storage.BlockManagerMaster: Removed 10 successfully in removeExecutor
14/11/08 08:21:20 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-194.us-west-2.compute.internal,45823)
14/11/08 08:21:20 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-194.us-west-2.compute.internal,45823)
14/11/08 08:21:20 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-194.us-west-2.compute.internal,45823)
14/11/08 08:21:20 INFO cluster.YarnClientSchedulerBackend: Executor 5 disconnected, so removing
it
14/11/08 08:21:20 ERROR cluster.YarnClientClusterScheduler: Lost executor 5 on ip-172-16-1-194.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:21:20 INFO scheduler.TaskSetManager: Re-queueing tasks for 5 from TaskSet 0.0
14/11/08 08:21:20 WARN scheduler.TaskSetManager: Lost task 21.0 in stage 0.0 (TID 21, ip-172-16-1-194.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:21:20 INFO scheduler.DAGScheduler: Executor lost: 5 (epoch 1)
14/11/08 08:21:20 INFO network.ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl@3bb633cd
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:289)
        at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
14/11/08 08:21:20 INFO storage.BlockManagerMasterActor: Trying to remove executor 5 from BlockManagerMaster.
14/11/08 08:21:20 INFO storage.BlockManagerMaster: Removed 5 successfully in removeExecutor
14/11/08 08:21:21 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-92.us-west-2.compute.internal,50928)
14/11/08 08:21:21 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-92.us-west-2.compute.internal,50928)
14/11/08 08:21:21 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-92.us-west-2.compute.internal,50928)
14/11/08 08:21:21 INFO cluster.YarnClientSchedulerBackend: Executor 27 disconnected, so removing
it
14/11/08 08:21:21 ERROR cluster.YarnClientClusterScheduler: Lost executor 27 on ip-172-16-1-92.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:21:21 INFO scheduler.TaskSetManager: Re-queueing tasks for 27 from TaskSet 0.0
14/11/08 08:21:21 WARN scheduler.TaskSetManager: Lost task 27.0 in stage 0.0 (TID 27, ip-172-16-1-92.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:21:21 INFO scheduler.DAGScheduler: Executor lost: 27 (epoch 2)
14/11/08 08:21:21 INFO storage.BlockManagerMasterActor: Trying to remove executor 27 from
BlockManagerMaster.
14/11/08 08:21:21 INFO storage.BlockManagerMaster: Removed 27 successfully in removeExecutor
14/11/08 08:21:21 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-152.us-west-2.compute.internal,41091)
14/11/08 08:21:21 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-152.us-west-2.compute.internal,41091)
14/11/08 08:21:21 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-152.us-west-2.compute.internal,41091)
14/11/08 08:21:21 INFO cluster.YarnClientSchedulerBackend: Executor 20 disconnected, so removing
it
14/11/08 08:21:21 ERROR cluster.YarnClientClusterScheduler: Lost executor 20 on ip-172-16-1-152.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:21:21 INFO scheduler.TaskSetManager: Re-queueing tasks for 20 from TaskSet 0.0
14/11/08 08:21:21 WARN scheduler.TaskSetManager: Lost task 29.0 in stage 0.0 (TID 29, ip-172-16-1-152.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:21:21 INFO scheduler.DAGScheduler: Executor lost: 20 (epoch 3)
14/11/08 08:21:21 INFO storage.BlockManagerMasterActor: Trying to remove executor 20 from
BlockManagerMaster.
14/11/08 08:21:21 INFO storage.BlockManagerMaster: Removed 20 successfully in removeExecutor
14/11/08 08:21:26 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-23.us-west-2.compute.internal,51269)
14/11/08 08:21:26 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-23.us-west-2.compute.internal,51269)
14/11/08 08:21:26 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-23.us-west-2.compute.internal,51269)
14/11/08 08:21:26 INFO cluster.YarnClientSchedulerBackend: Executor 6 disconnected, so removing
it
14/11/08 08:21:26 ERROR cluster.YarnClientClusterScheduler: Lost executor 6 on ip-172-16-1-23.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:21:26 INFO scheduler.TaskSetManager: Re-queueing tasks for 6 from TaskSet 0.0
14/11/08 08:21:26 WARN scheduler.TaskSetManager: Lost task 24.0 in stage 0.0 (TID 24, ip-172-16-1-23.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:21:26 INFO scheduler.DAGScheduler: Executor lost: 6 (epoch 4)
14/11/08 08:21:26 INFO storage.BlockManagerMasterActor: Trying to remove executor 6 from BlockManagerMaster.
14/11/08 08:21:26 INFO storage.BlockManagerMaster: Removed 6 successfully in removeExecutor
14/11/08 08:21:26 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-90.us-west-2.compute.internal,46792)
14/11/08 08:21:26 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-90.us-west-2.compute.internal,46792)
14/11/08 08:21:26 ERROR network.ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(ip-172-16-1-90.us-west-2.compute.internal,46792)
not found
14/11/08 08:21:26 INFO cluster.YarnClientSchedulerBackend: Executor 21 disconnected, so removing
it
14/11/08 08:21:26 ERROR cluster.YarnClientClusterScheduler: Lost executor 21 on ip-172-16-1-90.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:21:26 INFO scheduler.TaskSetManager: Re-queueing tasks for 21 from TaskSet 0.0
14/11/08 08:21:26 WARN scheduler.TaskSetManager: Lost task 25.0 in stage 0.0 (TID 25, ip-172-16-1-90.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:21:26 INFO scheduler.DAGScheduler: Executor lost: 21 (epoch 5)
14/11/08 08:21:26 INFO storage.BlockManagerMasterActor: Trying to remove executor 21 from
BlockManagerMaster.
14/11/08 08:21:26 INFO storage.BlockManagerMaster: Removed 21 successfully in removeExecutor
14/11/08 08:21:29 INFO cluster.YarnClientSchedulerBackend: Executor 18 disconnected, so removing
it
14/11/08 08:21:29 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,43883)
14/11/08 08:21:29 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,43883)
14/11/08 08:21:29 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,43883)
14/11/08 08:21:29 ERROR cluster.YarnClientClusterScheduler: Lost executor 18 on ip-172-16-1-222.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:21:29 INFO scheduler.TaskSetManager: Re-queueing tasks for 18 from TaskSet 0.0
14/11/08 08:21:29 WARN scheduler.TaskSetManager: Lost task 26.0 in stage 0.0 (TID 26, ip-172-16-1-222.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:21:29 INFO scheduler.DAGScheduler: Executor lost: 18 (epoch 6)
14/11/08 08:21:29 INFO storage.BlockManagerMasterActor: Trying to remove executor 18 from
BlockManagerMaster.
14/11/08 08:21:29 INFO storage.BlockManagerMaster: Removed 18 successfully in removeExecutor
14/11/08 08:21:30 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-194.us-west-2.compute.internal:50858/user/Executor#935992941]
with ID 31
14/11/08 08:21:30 INFO scheduler.TaskSetManager: Starting task 26.1 in stage 0.0 (TID 30,
ip-172-16-1-194.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:21:30 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-194.us-west-2.compute.internal:44263
with 776.3 MB RAM
14/11/08 08:21:31 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-194.us-west-2.compute.internal:44263
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:21:33 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,40102)
14/11/08 08:21:33 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,40102)
14/11/08 08:21:33 ERROR network.ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,40102)
not found
14/11/08 08:21:33 INFO cluster.YarnClientSchedulerBackend: Executor 26 disconnected, so removing
it
14/11/08 08:21:33 ERROR cluster.YarnClientClusterScheduler: Lost executor 26 on ip-172-16-1-222.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:21:33 INFO scheduler.TaskSetManager: Re-queueing tasks for 26 from TaskSet 0.0
14/11/08 08:21:33 WARN scheduler.TaskSetManager: Lost task 23.0 in stage 0.0 (TID 23, ip-172-16-1-222.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:21:33 INFO scheduler.DAGScheduler: Executor lost: 26 (epoch 7)
14/11/08 08:21:33 INFO storage.BlockManagerMasterActor: Trying to remove executor 26 from
BlockManagerMaster.
14/11/08 08:21:33 INFO storage.BlockManagerMaster: Removed 26 successfully in removeExecutor
14/11/08 08:21:36 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310)
14/11/08 08:21:36 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310)
14/11/08 08:21:36 INFO cluster.YarnClientSchedulerBackend: Executor 1 disconnected, so removing
it
14/11/08 08:21:36 ERROR cluster.YarnClientClusterScheduler: Lost executor 1 on ip-172-16-1-241.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:21:36 INFO scheduler.TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0
14/11/08 08:21:36 WARN scheduler.TaskSetManager: Lost task 22.0 in stage 0.0 (TID 22, ip-172-16-1-241.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:21:36 ERROR network.SendingConnection: Exception while reading SendingConnection
to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310)
java.nio.channels.ClosedChannelException
        at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
        at org.apache.spark.network.SendingConnection.read(Connection.scala:390)
        at org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
14/11/08 08:21:36 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 8)
14/11/08 08:21:36 INFO storage.BlockManagerMasterActor: Trying to remove executor 1 from BlockManagerMaster.
14/11/08 08:21:36 INFO storage.BlockManagerMaster: Removed 1 successfully in removeExecutor
14/11/08 08:21:36 INFO network.ConnectionManager: Handling connection error on connection
to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310)
14/11/08 08:21:36 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310)
14/11/08 08:21:36 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310)
14/11/08 08:21:40 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-194.us-west-2.compute.internal:58099/user/Executor#-112835629]
with ID 34
14/11/08 08:21:40 INFO scheduler.TaskSetManager: Starting task 22.1 in stage 0.0 (TID 31,
ip-172-16-1-194.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:21:41 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-194.us-west-2.compute.internal:41093
with 776.3 MB RAM
14/11/08 08:21:41 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-228.us-west-2.compute.internal:36136/user/Executor#318736262]
with ID 32
14/11/08 08:21:41 INFO scheduler.TaskSetManager: Starting task 23.1 in stage 0.0 (TID 32,
ip-172-16-1-228.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:21:41 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-90.us-west-2.compute.internal:33130/user/Executor#1744030597]
with ID 33
14/11/08 08:21:41 INFO scheduler.TaskSetManager: Starting task 25.1 in stage 0.0 (TID 33,
ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:21:41 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-92.us-west-2.compute.internal:55503/user/Executor#574084779]
with ID 35
14/11/08 08:21:41 INFO scheduler.TaskSetManager: Starting task 24.1 in stage 0.0 (TID 34,
ip-172-16-1-92.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:21:42 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-228.us-west-2.compute.internal:40128
with 776.3 MB RAM
14/11/08 08:21:42 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-90.us-west-2.compute.internal:32839
with 776.3 MB RAM
14/11/08 08:21:42 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-92.us-west-2.compute.internal:58081
with 776.3 MB RAM
14/11/08 08:21:42 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-194.us-west-2.compute.internal:41093
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:21:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-228.us-west-2.compute.internal:40128
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:21:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-92.us-west-2.compute.internal:58081
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:21:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-90.us-west-2.compute.internal:32839
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:21:43 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-152.us-west-2.compute.internal:34268/user/Executor#-937582169]
with ID 36
14/11/08 08:21:43 INFO scheduler.TaskSetManager: Starting task 29.1 in stage 0.0 (TID 35,
ip-172-16-1-152.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:21:44 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-152.us-west-2.compute.internal:52550
with 776.3 MB RAM
14/11/08 08:21:45 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-152.us-west-2.compute.internal:52550
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:21:46 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-90.us-west-2.compute.internal:34555/user/Executor#-94727554]
with ID 37
14/11/08 08:21:46 INFO scheduler.TaskSetManager: Starting task 27.1 in stage 0.0 (TID 36,
ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:21:46 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-228.us-west-2.compute.internal:34471/user/Executor#1412546630]
with ID 38
14/11/08 08:21:46 INFO scheduler.TaskSetManager: Starting task 21.1 in stage 0.0 (TID 37,
ip-172-16-1-228.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:21:47 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-90.us-west-2.compute.internal:46194
with 776.3 MB RAM
14/11/08 08:21:47 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-228.us-west-2.compute.internal:42275
with 776.3 MB RAM
14/11/08 08:21:48 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-90.us-west-2.compute.internal:46194
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:21:48 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-228.us-west-2.compute.internal:42275
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:21:50 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-23.us-west-2.compute.internal:37122/user/Executor#1404320204]
with ID 39
14/11/08 08:21:51 INFO scheduler.TaskSetManager: Starting task 28.1 in stage 0.0 (TID 38,
ip-172-16-1-23.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:21:51 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-23.us-west-2.compute.internal:33106
with 776.3 MB RAM
14/11/08 08:21:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-23.us-west-2.compute.internal:33106
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:22:36 INFO cluster.YarnClientSchedulerBackend: Executor 39 disconnected, so removing
it
14/11/08 08:22:36 ERROR cluster.YarnClientClusterScheduler: Lost executor 39 on ip-172-16-1-23.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:22:36 INFO scheduler.TaskSetManager: Re-queueing tasks for 39 from TaskSet 0.0
14/11/08 08:22:36 WARN scheduler.TaskSetManager: Lost task 28.1 in stage 0.0 (TID 38, ip-172-16-1-23.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:22:36 INFO scheduler.DAGScheduler: Executor lost: 39 (epoch 9)
14/11/08 08:22:36 INFO storage.BlockManagerMasterActor: Trying to remove executor 39 from
BlockManagerMaster.
14/11/08 08:22:36 INFO storage.BlockManagerMaster: Removed 39 successfully in removeExecutor
14/11/08 08:22:57 INFO cluster.YarnClientSchedulerBackend: Executor 36 disconnected, so removing
it
14/11/08 08:22:57 ERROR cluster.YarnClientClusterScheduler: Lost executor 36 on ip-172-16-1-152.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:22:57 INFO scheduler.TaskSetManager: Re-queueing tasks for 36 from TaskSet 0.0
14/11/08 08:22:57 WARN scheduler.TaskSetManager: Lost task 29.1 in stage 0.0 (TID 35, ip-172-16-1-152.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:22:57 INFO scheduler.DAGScheduler: Executor lost: 36 (epoch 10)
14/11/08 08:22:57 INFO storage.BlockManagerMasterActor: Trying to remove executor 36 from
BlockManagerMaster.
14/11/08 08:22:57 INFO storage.BlockManagerMaster: Removed 36 successfully in removeExecutor
14/11/08 08:23:00 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-90.us-west-2.compute.internal:48033/user/Executor#-1088273404]
with ID 40
14/11/08 08:23:00 INFO scheduler.TaskSetManager: Starting task 29.2 in stage 0.0 (TID 39,
ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:23:01 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-90.us-west-2.compute.internal:39067
with 776.3 MB RAM
14/11/08 08:23:03 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-90.us-west-2.compute.internal:39067
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:23:15 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-23.us-west-2.compute.internal:48860/user/Executor#-369895446]
with ID 41
14/11/08 08:23:15 INFO scheduler.TaskSetManager: Starting task 28.2 in stage 0.0 (TID 40,
ip-172-16-1-23.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:23:16 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-23.us-west-2.compute.internal:38093
with 776.3 MB RAM
14/11/08 08:23:17 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-23.us-west-2.compute.internal:38093
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:23:32 INFO cluster.YarnClientSchedulerBackend: Executor 34 disconnected, so removing
it
14/11/08 08:23:32 ERROR cluster.YarnClientClusterScheduler: Lost executor 34 on ip-172-16-1-194.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:23:32 INFO scheduler.TaskSetManager: Re-queueing tasks for 34 from TaskSet 0.0
14/11/08 08:23:32 WARN scheduler.TaskSetManager: Lost task 22.1 in stage 0.0 (TID 31, ip-172-16-1-194.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:23:32 INFO scheduler.DAGScheduler: Executor lost: 34 (epoch 11)
14/11/08 08:23:32 INFO storage.BlockManagerMasterActor: Trying to remove executor 34 from
BlockManagerMaster.
14/11/08 08:23:32 INFO storage.BlockManagerMaster: Removed 34 successfully in removeExecutor
14/11/08 08:23:53 INFO cluster.YarnClientSchedulerBackend: Executor 41 disconnected, so removing
it
14/11/08 08:23:53 ERROR cluster.YarnClientClusterScheduler: Lost executor 41 on ip-172-16-1-23.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:23:53 INFO scheduler.TaskSetManager: Re-queueing tasks for 41 from TaskSet 0.0
14/11/08 08:23:53 WARN scheduler.TaskSetManager: Lost task 28.2 in stage 0.0 (TID 40, ip-172-16-1-23.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:23:53 INFO scheduler.DAGScheduler: Executor lost: 41 (epoch 12)
14/11/08 08:23:53 INFO storage.BlockManagerMasterActor: Trying to remove executor 41 from
BlockManagerMaster.
14/11/08 08:23:53 INFO storage.BlockManagerMaster: Removed 41 successfully in removeExecutor
14/11/08 08:23:57 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-90.us-west-2.compute.internal:58017/user/Executor#2094507560]
with ID 42
14/11/08 08:23:57 INFO scheduler.TaskSetManager: Starting task 28.3 in stage 0.0 (TID 41,
ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:23:58 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-90.us-west-2.compute.internal:41182
with 776.3 MB RAM
14/11/08 08:24:00 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-90.us-west-2.compute.internal:41182
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:24:04 INFO cluster.YarnClientSchedulerBackend: Executor 35 disconnected, so removing
it
14/11/08 08:24:04 ERROR cluster.YarnClientClusterScheduler: Lost executor 35 on ip-172-16-1-92.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:24:04 INFO scheduler.TaskSetManager: Re-queueing tasks for 35 from TaskSet 0.0
14/11/08 08:24:04 WARN scheduler.TaskSetManager: Lost task 24.1 in stage 0.0 (TID 34, ip-172-16-1-92.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:24:04 INFO scheduler.DAGScheduler: Executor lost: 35 (epoch 13)
14/11/08 08:24:04 INFO storage.BlockManagerMasterActor: Trying to remove executor 35 from
BlockManagerMaster.
14/11/08 08:24:04 INFO storage.BlockManagerMaster: Removed 35 successfully in removeExecutor
14/11/08 08:24:17 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-90.us-west-2.compute.internal:36395/user/Executor#-1907878650]
with ID 43
14/11/08 08:24:17 INFO scheduler.TaskSetManager: Starting task 24.2 in stage 0.0 (TID 42,
ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:24:18 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-90.us-west-2.compute.internal:46948
with 776.3 MB RAM
14/11/08 08:24:20 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-90.us-west-2.compute.internal:46948
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:24:21 INFO cluster.YarnClientSchedulerBackend: Executor 40 disconnected, so removing
it
14/11/08 08:24:21 ERROR cluster.YarnClientClusterScheduler: Lost executor 40 on ip-172-16-1-90.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:24:21 INFO scheduler.TaskSetManager: Re-queueing tasks for 40 from TaskSet 0.0
14/11/08 08:24:21 WARN scheduler.TaskSetManager: Lost task 29.2 in stage 0.0 (TID 39, ip-172-16-1-90.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:24:21 INFO scheduler.DAGScheduler: Executor lost: 40 (epoch 14)
14/11/08 08:24:21 INFO storage.BlockManagerMasterActor: Trying to remove executor 40 from
BlockManagerMaster.
14/11/08 08:24:21 INFO storage.BlockManagerMaster: Removed 40 successfully in removeExecutor
14/11/08 08:24:31 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-90.us-west-2.compute.internal:34467/user/Executor#-1100688472]
with ID 44
14/11/08 08:24:31 INFO scheduler.TaskSetManager: Starting task 29.3 in stage 0.0 (TID 43,
ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:24:32 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-90.us-west-2.compute.internal:40126
with 776.3 MB RAM
14/11/08 08:24:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-90.us-west-2.compute.internal:40126
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:24:48 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-16-1-90.us-west-2.compute.internal:53257/user/Executor#-745380917]
with ID 45
14/11/08 08:24:48 INFO scheduler.TaskSetManager: Starting task 22.2 in stage 0.0 (TID 44,
ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes)
14/11/08 08:24:49 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-90.us-west-2.compute.internal:46252
with 776.3 MB RAM
14/11/08 08:24:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-90.us-west-2.compute.internal:46252
(size: 596.9 KB, free: 775.7 MB)
14/11/08 08:25:16 INFO cluster.YarnClientSchedulerBackend: Executor 38 disconnected, so removing
it
14/11/08 08:25:16 ERROR cluster.YarnClientClusterScheduler: Lost executor 38 on ip-172-16-1-228.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:25:16 INFO scheduler.TaskSetManager: Re-queueing tasks for 38 from TaskSet 0.0
14/11/08 08:25:16 WARN scheduler.TaskSetManager: Lost task 21.1 in stage 0.0 (TID 37, ip-172-16-1-228.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:25:16 INFO scheduler.DAGScheduler: Executor lost: 38 (epoch 15)
14/11/08 08:25:16 INFO storage.BlockManagerMasterActor: Trying to remove executor 38 from
BlockManagerMaster.
14/11/08 08:25:16 INFO storage.BlockManagerMaster: Removed 38 successfully in removeExecutor
14/11/08 08:25:37 INFO cluster.YarnClientSchedulerBackend: Executor 42 disconnected, so removing
it
14/11/08 08:25:37 ERROR cluster.YarnClientClusterScheduler: Lost executor 42 on ip-172-16-1-90.us-west-2.compute.internal:
remote Akka client disassociated
14/11/08 08:25:37 INFO scheduler.TaskSetManager: Re-queueing tasks for 42 from TaskSet 0.0
14/11/08 08:25:37 WARN scheduler.TaskSetManager: Lost task 28.3 in stage 0.0 (TID 41, ip-172-16-1-90.us-west-2.compute.internal):
ExecutorLostFailure (executor lost)
14/11/08 08:25:37 ERROR scheduler.TaskSetManager: Task 28 in stage 0.0 failed 4 times; aborting
job
14/11/08 08:25:37 INFO cluster.YarnClientClusterScheduler: Cancelling stage 0
14/11/08 08:25:37 INFO cluster.YarnClientClusterScheduler: Stage 0 was cancelled
14/11/08 08:25:37 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile at NativeMethodAccessorImpl.java:-2
14/11/08 08:25:37 INFO scheduler.DAGScheduler: Executor lost: 42 (epoch 16)
14/11/08 08:25:37 INFO storage.BlockManagerMasterActor: Trying to remove executor 42 from
BlockManagerMaster.
14/11/08 08:25:37 INFO storage.BlockManagerMaster: Removed 42 successfully in removeExecutor

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message