But sometimes you might have skew and almost all the result data are in one or a few tasks though. 

On Friday, February 26, 2016, Jeff Zhang <zjffdu@gmail.com> wrote:

My job get this exception very easily even when I set large value of spark.driver.maxResultSize. After checking the spark code, I found spark.driver.maxResultSize is also used in Executor side to decide whether DirectTaskResult/InDirectTaskResult sent. This doesn't make sense to me. Using  spark.driver.maxResultSize / taskNum might be more proper. Because if  spark.driver.maxResultSize is 1g and we have 10 tasks each has 200m output. Then even the output of each task is less than  spark.driver.maxResultSize so DirectTaskResult will be sent to driver, but the total result size is 2g which will cause exception in driver side. 


16/02/26 10:10:49 INFO DAGScheduler: Job 4 failed: treeAggregate at LogisticRegression.scala:283, took 33.796379 s

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 1 tasks (1085.0 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)


--
Best Regards

Jeff Zhang