spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Shen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-4909) "Error communicating with MapOutputTracker" when run a big spark job
Date Sat, 20 Dec 2014 09:01:13 GMT
Hong Shen created SPARK-4909:
--------------------------------

             Summary: "Error communicating with MapOutputTracker" when run a big spark job
                 Key: SPARK-4909
                 URL: https://issues.apache.org/jira/browse/SPARK-4909
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.1.0
            Reporter: Hong Shen


When I run a job spark job with 38788 mapTask and 997 reduceTask, Job failed. Here is the
log.
14/12/20 15:11:18 ERROR spark.MapOutputTrackerWorker: Error communicating with MapOutputTracker
java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:107)
        at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109)
        at org.apache.spark.MapOutputTracker.getServerStatuses(MapOutputTracker.scala:162)
        at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.fetch(BlockStoreShuffleFetcher.scala:43)
        at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:41)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:117)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:293)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:260)
        at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:293)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:260)
        at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:114)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:293)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:260)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:293)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:260)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:293)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:260)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:293)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:260)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:54)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

The mapOutputStatus is more than 15MB, and more than 500 executor ask driver to send map output
locations for shuffle, and driver will send map output locations to all the executors, it's
obviously will cause executor timeout. 
Maybe we can optimize it, do not let driver send map output locations to all the executors,
for example, to use broadcast variable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message