spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: Serializability: for vs. while loops
Date Thu, 15 Jan 2015 08:05:50 GMT
Scala for-loops are implemented as closures using anonymous inner classes
which are instantiated once and invoked many times. This means, though,
that the code inside the loop is actually sitting inside a class, which
confuses Spark's Closure Cleaner, whose job is to remove unused references
from closures to make otherwise-unserializable objects serializable.

My understanding is, in particular, that the closure cleaner will null out
unused fields in the closure, but cannot go past the first level of depth
(i.e., it will not follow field references and null out *their *unused, and
possibly unserializable, references), because this could end up mutating
state outside of the closure itself. Thus, the extra level of depth of the
closure that was introduced by the anonymous class (where presumably the
"outer this" pointer is considered "used" by the closure cleaner) is
sufficient to make it unserializable.

While loops, on the other hand, involve none of this trickery, and everyone
is happy.

On Wed, Jan 14, 2015 at 11:37 PM, Tobias Pfeiffer <tgp@preferred.jp> wrote:

> Hi,
>
> sorry, I don't like questions about serializability myself, but still...
>
> Can anyone give me a hint why
>
>   for (i <- 0 to (maxId - 1)) {  ...  }
>
> throws a NotSerializableException in the loop body while
>
>   var i = 0
>   while (i < maxId) {
>     // same code as in the for loop
>     i += 1
>   }
>
> works fine? I guess there is something fundamentally different in the way
> Scala realizes for loops?
>
> Thanks
> Tobias
>

Mime
View raw message