spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <...@preferred.jp>
Subject Re: closure serialization behavior driving me crazy
Date Tue, 11 Nov 2014 01:21:00 GMT
Sandy,

On Mon, Nov 10, 2014 at 6:01 PM, Sandy Ryza <sandy.ryza@cloudera.com> wrote:
>
> The result array is 1867 x 5. It serialized is 80k bytes, which seems
> about right:
>   scala> SparkEnv.get.closureSerializer.newInstance().serialize(arr)
>   res17: java.nio.ByteBuffer = java.nio.HeapByteBuffer[pos=0 lim=80027
> cap=80027]
>
> If I reference it from a simple function:
>   scala> def func(x: Long) => arr.length
>   scala> SparkEnv.get.closureSerializer.newInstance().serialize(func)
> I get a NotSerializableException.
>

(Are you in fact doing this from the Scala shell? Maybe try it from
compiled code.)

I may be wrong on that, but my understanding is that a "def" is a lot
different from a "val" (holding a function) in terms of serialization.  A
"def" (such as yours defined above) that is a method of a class does not
have a class file of its own, so it must be serialized with the whole
object that it belongs to (which can fail easily if you reference a
SparkContext etc.).
A "val" (such as `val func: Long => Long = x => arr.length`) will be
compiled into a class file of its own, so only that (pretty small) function
object will have to be serialized.  That may explain why some functions are
serializable while others with the same content aren't, and also the
difference in size.

Tobias

Mime
View raw message