spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Task not serializable
Date Sat, 06 Sep 2014 11:02:30 GMT
I disagree that the generally right change is to try to make the
classes serializable. Usually, classes that are not serializable are
not supposed to be serialized. You're using them in a way that's
causing them to be serialized, and that's probably not desired.

For example, this is wrong:

val foo: SomeUnserializableManagerClass = ...
rdd.map(d => foo.bar(d))

This is right:

rdd.map { d =>
  val foo: SomeUnserializableManagerClass = ...
  foo.bar(d)
}

In the first instance, you create the object on the driver and try to
serialize and copy it to workers. In the second, you're creating
SomeUnserializableManagerClass in the function and therefore on the
worker.

mapPartitions is better if this creation is expensive.

On Fri, Sep 5, 2014 at 3:06 PM, Sarath Chandra
<sarathchandra.josyam@algofusiontech.com> wrote:
> Hi,
>
> I'm trying to migrate a map-reduce program to work with spark. I migrated
> the program from Java to Scala. The map-reduce program basically loads a
> HDFS file and for each line in the file it applies several transformation
> functions available in various external libraries.
>
> When I execute this over spark, it is throwing me "Task not serializable"
> exceptions for each and every class being used from these from external
> libraries. I included serialization to few classes which are in my scope,
> but there there are several other classes which are out of my scope like
> org.apache.hadoop.io.Text.
>
> How to overcome these exceptions?
>
> ~Sarath.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message