spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Vacek <minnesota...@gmail.com>
Subject Re: Stage failures
Date Mon, 28 Oct 2013 15:29:29 GMT
Yes, I looked at the log, and the serialized tasks were about 2k bytes as
well.  Is there anything I can do to move this along?


On Thu, Oct 24, 2013 at 2:05 PM, Josh Rosen <rosenville@gmail.com> wrote:

> Maybe this is a bug in the ClosureCleaner.  If you look at the
>
> 13/10/23 14:16:39 INFO cluster.ClusterTaskSetManager: Serialized task
>> 0.0:263 as 39625334 bytes in 55 ms
>
>
> line in your log, this corresponds to the driver serializing a ~38
> megabyte task.  This suggests that the broadcasted data is accidentally
> being included in the serialized task.
>
> When I tried this locally with val lb = sc.broadcast( (1 to
> 5000000).toSet)), I saw a serialized task size of 1814 bytes.
>
>
> On Thu, Oct 24, 2013 at 10:59 AM, Tom Vacek <minnesota.cs@gmail.com>wrote:
>
>> I've figured out what the problem is, but I don't understand why.  I'm
>> hoping somebody can explain this:
>>
>> (in the spark shell)
>> val lb = sc.broadcast( (1 to 10000000).toSet)
>> val breakMe = sc.parallelize(1 to 250).mapPartitions( it => {val
>> serializedSet = lb.value.toString; Array(0).iterator}).count  //works great
>>
>> val ll = (1 to 10000000).toSet
>> val lb = sc.broadcast(ll)
>> val breakMe = sc.parallelize(1 to 250).mapPartitions( it => {val
>> serializedSet = lb.value.toString; Array(0).iterator}).count  //Crashes
>> ignominiously
>>
>>
>>
>

Mime
View raw message