spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Rosen <rosenvi...@gmail.com>
Subject Re: Stage failures
Date Thu, 24 Oct 2013 19:05:18 GMT
Maybe this is a bug in the ClosureCleaner.  If you look at the

13/10/23 14:16:39 INFO cluster.ClusterTaskSetManager: Serialized task
> 0.0:263 as 39625334 bytes in 55 ms


line in your log, this corresponds to the driver serializing a ~38 megabyte
task.  This suggests that the broadcasted data is accidentally being
included in the serialized task.

When I tried this locally with val lb = sc.broadcast( (1 to 5000000).toSet)),
I saw a serialized task size of 1814 bytes.


On Thu, Oct 24, 2013 at 10:59 AM, Tom Vacek <minnesota.cs@gmail.com> wrote:

> I've figured out what the problem is, but I don't understand why.  I'm
> hoping somebody can explain this:
>
> (in the spark shell)
> val lb = sc.broadcast( (1 to 10000000).toSet)
> val breakMe = sc.parallelize(1 to 250).mapPartitions( it => {val
> serializedSet = lb.value.toString; Array(0).iterator}).count  //works great
>
> val ll = (1 to 10000000).toSet
> val lb = sc.broadcast(ll)
> val breakMe = sc.parallelize(1 to 250).mapPartitions( it => {val
> serializedSet = lb.value.toString; Array(0).iterator}).count  //Crashes
> ignominiously
>
>
>

Mime
View raw message