You can check in the worker logs for more accurate information(that are found under the work directory inside spark directory). I used to hit this issue with:

- Too many open files : Increasing the ulimit would solve this issue
- Akka connection timeout/Framesize: Setting the following while creating sparkContext would solve it

 .set("spark.rdd.compress","true")
      .set("spark.storage.memoryFraction","1")
      .set("spark.core.connection.ack.wait.timeout","600")
      .set("spark.akka.frameSize","50")



Thanks
Best Regards

On Sat, Nov 1, 2014 at 12:28 AM, <jan.zikes@centrum.cz> wrote:

Hi,

I am running my Spark job and I am getting ExecutorLostFailure (executor lost) using PySpak. I don't get any Error in my code, but just this. So I would like to ask, what can be possibly wrong. From the log it seems like some kind of internal problem in Spark.

Thank you in advance for any suggestions and help.

2014-10-31 18:13:11,423 : INFO : spark:track_progress:300 : Traceback (most recent call last):

INFO: File "/home/hadoop/preprocessor.py", line 69, in <module>

2014-10-31 18:13:11,423 : INFO : spark:track_progress:300 : File "/home/hadoop/preprocessor.py", line 69, in <module>

 

INFO: cleanedData.saveAsTextFile(sys.argv[3])

2014-10-31 18:13:11,423 : INFO : spark:track_progress:300 : cleanedData.saveAsTextFile(sys.argv[3])

INFO: File "/home/hadoop/spark/python/pyspark/rdd.py", line 1324, in saveAsTextFile

2014-10-31 18:13:11,424 : INFO : spark:track_progress:300 : File "/home/hadoop/spark/python/pyspark/rdd.py", line 1324, in saveAsTextFile

INFO: keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path)

2014-10-31 18:13:11,424 : INFO : spark:track_progress:300 : keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path)

INFO: File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__

2014-10-31 18:13:11,424 : INFO : spark:track_progress:300 : File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__

INFO: File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value

2014-10-31 18:13:11,424 : INFO : spark:track_progress:300 : File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value

INFO: py4j.protocol.Py4JJavaError: An error occurred while calling o47.saveAsTextFile.

2014-10-31 18:13:11,431 : INFO : spark:track_progress:300 : py4j.protocol.Py4JJavaError: An error occurred while calling o47.saveAsTextFile.

INFO: : org.apache.spark.SparkException: Job aborted due to stage failure: Task 20 in stage 0.0 failed 4 times, most recent failure: Lost task 20.3 in stage 0.0 (TID 110, ip-172-31-26-147.us-west-2.compute.internal): ExecutorLostFailure (executor lost)

2014-10-31 18:13:11,431 : INFO : spark:track_progress:300 : : org.apache.spark.SparkException: Job aborted due to stage failure: Task 20 in stage 0.0 failed 4 times, most recent failure: Lost task 20.3 in stage 0.0 (TID 110, ip-172-31-26-147.us-west-2.compute.internal): ExecutorLostFailure (executor lost)

INFO: Driver stacktrace:

2014-10-31 18:13:11,431 : INFO : spark:track_progress:300 : Driver stacktrace:

INFO: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)

2014-10-31 18:13:11,432 : INFO : spark:track_progress:300 : at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)

INFO: at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)

2014-10-31 18:13:11,432 : INFO : spark:track_progress:300 : at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)

INFO: at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)

2014-10-31 18:13:11,432 : INFO : spark:track_progress:300 : at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)

INFO: at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

2014-10-31 18:13:11,432 : INFO : spark:track_progress:300 : at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

INFO: at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

2014-10-31 18:13:11,432 : INFO : spark:track_progress:300 : at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

INFO: at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)

2014-10-31 18:13:11,433 : INFO : spark:track_progress:300 : at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)

INFO: at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)

2014-10-31 18:13:11,433 : INFO : spark:track_progress:300 : at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)

INFO: at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)

2014-10-31 18:13:11,433 : INFO : spark:track_progress:300 : at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)

INFO: at scala.Option.foreach(Option.scala:236)

2014-10-31 18:13:11,433 : INFO : spark:track_progress:300 : at scala.Option.foreach(Option.scala:236)

INFO: at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)

2014-10-31 18:13:11,433 : INFO : spark:track_progress:300 : at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)

INFO: at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)

2014-10-31 18:13:11,434 : INFO : spark:track_progress:300 : at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)

INFO: at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)

2014-10-31 18:13:11,434 : INFO : spark:track_progress:300 : at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)

INFO: at akka.actor.ActorCell.invoke(ActorCell.scala:456)

2014-10-31 18:13:11,434 : INFO : spark:track_progress:300 : at akka.actor.ActorCell.invoke(ActorCell.scala:456)

INFO: at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)

2014-10-31 18:13:11,434 : INFO : spark:track_progress:300 : at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)

INFO: at akka.dispatch.Mailbox.run(Mailbox.scala:219)

2014-10-31 18:13:11,434 : INFO : spark:track_progress:300 : at akka.dispatch.Mailbox.run(Mailbox.scala:219)

INFO: at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)

2014-10-31 18:13:11,435 : INFO : spark:track_progress:300 : at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)

INFO: at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

2014-10-31 18:13:11,435 : INFO : spark:track_progress:300 : at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

INFO: at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

2014-10-31 18:13:11,435 : INFO : spark:track_progress:300 : at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

INFO: at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

2014-10-31 18:13:11,435 : INFO : spark:track_progress:300 : at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

INFO: at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

2014-10-31 18:13:11,435 : INFO : spark:track_progress:300 : at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

INFO: 

2014-10-31 18:13:11,436 : INFO : spark:track_progress:300 : 

INFO: 

2014-10-31 18:13:12,315 : INFO : spark:track_progress:300 : 

INFO: 

 

2014-10-31 18:13:12,315 : INFO : spark:track_progress:309 :



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org