spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anup Sawant <anupsatishsaw...@gmail.com>
Subject Executor Lost Failure
Date Tue, 29 Sep 2015 16:02:29 GMT
Hi all,
Any idea why I am getting 'Executor heartbeat timed out' ? I am fairly new
to Spark so I have less knowledge about the internals of it. The job was
running for a day or so on 102 Gb of data with 40 workers.
-Best,
Anup.

15/09/29 06:32:03 ERROR TaskSchedulerImpl: Lost executor driver on
localhost: Executor heartbeat timed out after 395987 ms
15/09/29 06:32:03 WARN MemoryStore: Failed to reserve initial memory
threshold of 1024.0 KB for computing block rdd_2_1813 in memory.
15/09/29 06:32:03 WARN MemoryStore: Not enough space to cache rdd_2_1813 in
memory! (computed 840.0 B so far)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1782.0 in stage 2713.0
(TID 9101184, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 ERROR TaskSetManager: Task 1782 in stage 2713.0 failed 1
times; aborting job
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1791.0 in stage 2713.0
(TID 9101193, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1800.0 in stage 2713.0
(TID 9101202, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1764.0 in stage 2713.0
(TID 9101166, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1773.0 in stage 2713.0
(TID 9101175, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1809.0 in stage 2713.0
(TID 9101211, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1794.0 in stage 2713.0
(TID 9101196, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1740.0 in stage 2713.0
(TID 9101142, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1803.0 in stage 2713.0
(TID 9101205, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1812.0 in stage 2713.0
(TID 9101214, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1785.0 in stage 2713.0
(TID 9101187, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1767.0 in stage 2713.0
(TID 9101169, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1776.0 in stage 2713.0
(TID 9101178, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1797.0 in stage 2713.0
(TID 9101199, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1779.0 in stage 2713.0
(TID 9101181, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1806.0 in stage 2713.0
(TID 9101208, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1788.0 in stage 2713.0
(TID 9101190, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1761.0 in stage 2713.0
(TID 9101163, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1755.0 in stage 2713.0
(TID 9101157, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1796.0 in stage 2713.0
(TID 9101198, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1778.0 in stage 2713.0
(TID 9101180, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1787.0 in stage 2713.0
(TID 9101189, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1805.0 in stage 2713.0
(TID 9101207, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1790.0 in stage 2713.0
(TID 9101192, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1781.0 in stage 2713.0
(TID 9101183, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1808.0 in stage 2713.0
(TID 9101210, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1799.0 in stage 2713.0
(TID 9101201, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1772.0 in stage 2713.0
(TID 9101174, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1763.0 in stage 2713.0
(TID 9101165, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1802.0 in stage 2713.0
(TID 9101204, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1748.0 in stage 2713.0
(TID 9101150, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1775.0 in stage 2713.0
(TID 9101177, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1766.0 in stage 2713.0
(TID 9101168, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1811.0 in stage 2713.0
(TID 9101213, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1793.0 in stage 2713.0
(TID 9101195, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1769.0 in stage 2713.0
(TID 9101171, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1810.0 in stage 2713.0
(TID 9101212, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1801.0 in stage 2713.0
(TID 9101203, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1795.0 in stage 2713.0
(TID 9101197, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1777.0 in stage 2713.0
(TID 9101179, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1786.0 in stage 2713.0
(TID 9101188, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1804.0 in stage 2713.0
(TID 9101206, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1813.0 in stage 2713.0
(TID 9101215, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1807.0 in stage 2713.0
(TID 9101209, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1789.0 in stage 2713.0
(TID 9101191, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1780.0 in stage 2713.0
(TID 9101182, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1798.0 in stage 2713.0
(TID 9101200, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1792.0 in stage 2713.0
(TID 9101194, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1765.0 in stage 2713.0
(TID 9101167, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1774.0 in stage 2713.0
(TID 9101176, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1783.0 in stage 2713.0
(TID 9101185, localhost): ExecutorLostFailure (executor driver lost)
15/09/29 06:32:03 WARN TaskSetManager: Lost task 1756.0 in stage 2713.0
(TID 9101158, localhost): ExecutorLostFailure (executor driver lost)
[Stage 2713:=========================>                       (1762 + 51) /
3354]15/09/29 06:32:03 WARN SparkContext: Killing executors is only
supported in coarse-grained mode
15/09/29 06:32:04 ERROR BlockManager: Failed to report rdd_2_3032 to
master; giving up.
Traceback (most recent call last):
  File "/data/home/as198/sdword2vec.py", line 139, in <module>
    main()
  File "/data/home/as198/sdword2vec.py", line 136, in main
    tryGensim()
  File "/data/home/as198/sdword2vec.py", line 114, in tryGensim
    model_dm.build_vocab(articles)
  File
"/usr/lib/python2.7/site-packages/gensim-0.12.2-py2.7-linux-x86_64.egg/gensim/models/word2vec.py",
line 495, in build_vocab
    self.scan_vocab(sentences, trim_rule=trim_rule)  # initial survey
  File
"/usr/lib/python2.7/site-packages/gensim-0.12.2-py2.7-linux-x86_64.egg/gensim/models/doc2vec.py",
line 620, in scan_vocab
    for document_no, document in enumerate(documents):
  File "/data/home/ass198/sdword2vec.py", line 97, in __iter__
    for article in labeled_rdd.collect():
  File
"/usr/local/spark-1.5.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py",
line 773, in collect
  File
"/usr/local/spark-1.5.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
  File
"/usr/local/spark-1.5.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task
1782 in stage 2713.0 failed 1 times, most recent failure: Lost task 1782.0
in stage 2713.0 (TID 9101184, localhost): ExecutorLostFailure (executor
driver lost)
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:
1280)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.
apply(DAGScheduler.scala:1268)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.
apply(DAGScheduler.scala:1267)
        at scala.collection.mutable.ResizableArray$class
.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47
)
        at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1267)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.
apply(DAGScheduler.scala:697)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.
apply(DAGScheduler.scala:697)
        at scala.Option.foreach(Option.scala:236)
        at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:
697)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:
1493)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:
1455)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:
1444)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1826)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1839)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:905)
        at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:
147)
        at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:
108)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:904)
        at
org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:373)
        at
org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
        at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:
43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379
)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:
133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:207)
        at java.lang.Thread.run(Thread.java:745)

Mime
View raw message