spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sarath Chandra <sarathchandra.jos...@algofusiontech.com>
Subject Re: Task not serializable
Date Sat, 06 Sep 2014 12:29:43 GMT
Thanks Alok, Sean.

As suggested by Sean, I tried a sample program. I wrote a function in which
I made a reference to a class from third party library that is not
serialized and passed it to my map function. On executing I got same
exception.

Then I modified the program removed function and written it's contents as
anonymous function inside map function. This time the execution succeeded.

I understood the explanation of Sean. But request for references to a more
detailed explanation and examples for writing efficient spark programs
avoiding such pitfalls.

~Sarath
 On 06-Sep-2014 4:32 pm, "Sean Owen" <sowen@cloudera.com> wrote:

> I disagree that the generally right change is to try to make the
> classes serializable. Usually, classes that are not serializable are
> not supposed to be serialized. You're using them in a way that's
> causing them to be serialized, and that's probably not desired.
>
> For example, this is wrong:
>
> val foo: SomeUnserializableManagerClass = ...
> rdd.map(d => foo.bar(d))
>
> This is right:
>
> rdd.map { d =>
>   val foo: SomeUnserializableManagerClass = ...
>   foo.bar(d)
> }
>
> In the first instance, you create the object on the driver and try to
> serialize and copy it to workers. In the second, you're creating
> SomeUnserializableManagerClass in the function and therefore on the
> worker.
>
> mapPartitions is better if this creation is expensive.
>
> On Fri, Sep 5, 2014 at 3:06 PM, Sarath Chandra
> <sarathchandra.josyam@algofusiontech.com> wrote:
> > Hi,
> >
> > I'm trying to migrate a map-reduce program to work with spark. I migrated
> > the program from Java to Scala. The map-reduce program basically loads a
> > HDFS file and for each line in the file it applies several transformation
> > functions available in various external libraries.
> >
> > When I execute this over spark, it is throwing me "Task not serializable"
> > exceptions for each and every class being used from these from external
> > libraries. I included serialization to few classes which are in my scope,
> > but there there are several other classes which are out of my scope like
> > org.apache.hadoop.io.Text.
> >
> > How to overcome these exceptions?
> >
> > ~Sarath.
>

Mime
View raw message