spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: ExecutorLostFailure (executor lost)
Date Sun, 02 Nov 2014 16:38:09 GMT
You can check in the worker logs for more accurate information(that are
found under the work directory inside spark directory). I used to hit this
issue with:

- Too many open files : Increasing the ulimit would solve this issue
- Akka connection timeout/Framesize: Setting the following while creating
sparkContext would solve it

 .set("spark.rdd.compress","true")
>       .set("spark.storage.memoryFraction","1")
>       .set("spark.core.connection.ack.wait.timeout","600")
>       .set("spark.akka.frameSize","50")




Thanks
Best Regards

On Sat, Nov 1, 2014 at 12:28 AM, <jan.zikes@centrum.cz> wrote:

> Hi,
>
> I am running my Spark job and I am getting ExecutorLostFailure (executor
> lost) using PySpak. I don't get any Error in my code, but just this. So I
> would like to ask, what can be possibly wrong. From the log it seems like
> some kind of internal problem in Spark.
>
> Thank you in advance for any suggestions and help.
>
>  2014-10-31 18:13:11,423 : INFO : spark:track_progress:300 : Traceback
> (most recent call last):
>
> INFO: File "/home/hadoop/preprocessor.py", line 69, in <module>
>
> 2014-10-31 18:13:11,423 : INFO : spark:track_progress:300 : File
> "/home/hadoop/preprocessor.py", line 69, in <module>
>
>
>
> INFO: cleanedData.saveAsTextFile(sys.argv[3])
>
> 2014-10-31 18:13:11,423 : INFO : spark:track_progress:300 :
> cleanedData.saveAsTextFile(sys.argv[3])
>
> INFO: File "/home/hadoop/spark/python/pyspark/rdd.py", line 1324, in
> saveAsTextFile
>
> 2014-10-31 18:13:11,424 : INFO : spark:track_progress:300 : File
> "/home/hadoop/spark/python/pyspark/rdd.py", line 1324, in saveAsTextFile
>
> INFO: keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path)
>
> 2014-10-31 18:13:11,424 : INFO : spark:track_progress:300 :
> keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path)
>
> INFO: File
> "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 538, in __call__
>
> 2014-10-31 18:13:11,424 : INFO : spark:track_progress:300 : File
> "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 538, in __call__
>
> INFO: File
> "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line
> 300, in get_return_value
>
> 2014-10-31 18:13:11,424 : INFO : spark:track_progress:300 : File
> "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line
> 300, in get_return_value
>
> INFO: py4j.protocol.Py4JJavaError: An error occurred while calling
> o47.saveAsTextFile.
>
> 2014-10-31 18:13:11,431 : INFO : spark:track_progress:300 :
> py4j.protocol.Py4JJavaError: An error occurred while calling
> o47.saveAsTextFile.
>
> INFO: : org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 20 in stage 0.0 failed 4 times, most recent failure: Lost task 20.3 in
> stage 0.0 (TID 110, ip-172-31-26-147.us-west-2.compute.internal):
> ExecutorLostFailure (executor lost)
>
> 2014-10-31 18:13:11,431 : INFO : spark:track_progress:300 : :
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 20
> in stage 0.0 failed 4 times, most recent failure: Lost task 20.3 in stage
> 0.0 (TID 110, ip-172-31-26-147.us-west-2.compute.internal):
> ExecutorLostFailure (executor lost)
>
> INFO: Driver stacktrace:
>
> 2014-10-31 18:13:11,431 : INFO : spark:track_progress:300 : Driver
> stacktrace:
>
> INFO: at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
>
> 2014-10-31 18:13:11,432 : INFO : spark:track_progress:300 : at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
>
> INFO: at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
>
> 2014-10-31 18:13:11,432 : INFO : spark:track_progress:300 : at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
>
> INFO: at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
>
> 2014-10-31 18:13:11,432 : INFO : spark:track_progress:300 : at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
>
> INFO: at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> 2014-10-31 18:13:11,432 : INFO : spark:track_progress:300 : at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> INFO: at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> 2014-10-31 18:13:11,432 : INFO : spark:track_progress:300 : at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> INFO: at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
>
> 2014-10-31 18:13:11,433 : INFO : spark:track_progress:300 : at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
>
> INFO: at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>
> 2014-10-31 18:13:11,433 : INFO : spark:track_progress:300 : at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>
> INFO: at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>
> 2014-10-31 18:13:11,433 : INFO : spark:track_progress:300 : at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>
> INFO: at scala.Option.foreach(Option.scala:236)
>
> 2014-10-31 18:13:11,433 : INFO : spark:track_progress:300 : at
> scala.Option.foreach(Option.scala:236)
>
> INFO: at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
>
> 2014-10-31 18:13:11,433 : INFO : spark:track_progress:300 : at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
>
> INFO: at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
>
> 2014-10-31 18:13:11,434 : INFO : spark:track_progress:300 : at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
>
> INFO: at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>
> 2014-10-31 18:13:11,434 : INFO : spark:track_progress:300 : at
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>
> INFO: at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>
> 2014-10-31 18:13:11,434 : INFO : spark:track_progress:300 : at
> akka.actor.ActorCell.invoke(ActorCell.scala:456)
>
> INFO: at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>
> 2014-10-31 18:13:11,434 : INFO : spark:track_progress:300 : at
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>
> INFO: at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>
> 2014-10-31 18:13:11,434 : INFO : spark:track_progress:300 : at
> akka.dispatch.Mailbox.run(Mailbox.scala:219)
>
> INFO: at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>
> 2014-10-31 18:13:11,435 : INFO : spark:track_progress:300 : at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>
> INFO: at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>
> 2014-10-31 18:13:11,435 : INFO : spark:track_progress:300 : at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>
> INFO: at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>
> 2014-10-31 18:13:11,435 : INFO : spark:track_progress:300 : at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>
> INFO: at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>
> 2014-10-31 18:13:11,435 : INFO : spark:track_progress:300 : at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>
> INFO: at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> 2014-10-31 18:13:11,435 : INFO : spark:track_progress:300 : at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> INFO:
>
> 2014-10-31 18:13:11,436 : INFO : spark:track_progress:300 :
>
> INFO:
>
> 2014-10-31 18:13:12,315 : INFO : spark:track_progress:300 :
>
> INFO:
>
>
>
> 2014-10-31 18:13:12,315 : INFO : spark:track_progress:309 :
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

Mime
View raw message