spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: Is spark.driver.maxResultSize used correctly ?
Date Tue, 01 Mar 2016 09:24:53 GMT
Check the code again. Looks like currently the task result will be loaded
into memory no matter it is DirectTaskResult or InDirectTaskResult.
Previous I thought InDirectTaskResult can be loaded into memory later which
can save memory, RDD#collectAsIterator is what I thought that may save
memory.

On Tue, Mar 1, 2016 at 5:00 PM, Reynold Xin <rxin@databricks.com> wrote:

> How big of a deal is this though? If I am reading your email correctly,
> either way this job will fail. You simply want it to fail earlier in the
> executor side, rather than collecting it and fail on the driver side?
>
>
> On Sunday, February 28, 2016, Jeff Zhang <zjffdu@gmail.com> wrote:
>
>> data skew might be possible, but not the common case. I think we should
>> design for the common case, for the skew case, we may can set some
>> parameter of fraction to allow user to tune it.
>>
>> On Sat, Feb 27, 2016 at 4:51 PM, Reynold Xin <rxin@databricks.com> wrote:
>>
>>> But sometimes you might have skew and almost all the result data are in
>>> one or a few tasks though.
>>>
>>>
>>> On Friday, February 26, 2016, Jeff Zhang <zjffdu@gmail.com> wrote:
>>>
>>>>
>>>> My job get this exception very easily even when I set large value of
>>>> spark.driver.maxResultSize. After checking the spark code, I found
>>>> spark.driver.maxResultSize is also used in Executor side to decide whether
>>>> DirectTaskResult/InDirectTaskResult sent. This doesn't make sense to me.
>>>> Using  spark.driver.maxResultSize / taskNum might be more proper. Because
>>>> if  spark.driver.maxResultSize is 1g and we have 10 tasks each has 200m
>>>> output. Then even the output of each task is less than
>>>>  spark.driver.maxResultSize so DirectTaskResult will be sent to driver, but
>>>> the total result size is 2g which will cause exception in driver side.
>>>>
>>>>
>>>> 16/02/26 10:10:49 INFO DAGScheduler: Job 4 failed: treeAggregate at
>>>> LogisticRegression.scala:283, took 33.796379 s
>>>>
>>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>>>> due to stage failure: Total size of serialized results of 1 tasks (1085.0
>>>> MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>>
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>


-- 
Best Regards

Jeff Zhang

Mime
View raw message