spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ulanov, Alexander" <alexander.ula...@hp.com>
Subject Loading previously serialized object to Spark
Date Fri, 06 Mar 2015 20:55:18 GMT
Hi,

I've implemented class MyClass in MLlib that does some operation on LabeledPoint. MyClass
extends serializable, so I can map this operation on data of RDD[LabeledPoints], such as data.map(lp
=> MyClass.operate(lp)). I write this class in file with ObjectOutputStream.writeObject.
Then I stop and restart Spark. I load this class from file with ObjectInputStream.readObject.asInstanceOf[MyClass].
When I try to map the same operation of this class to RDD, Spark throws not serializable exception:
org.apache.spark.SparkException: Task not serializable
        at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:1453)
        at org.apache.spark.rdd.RDD.map(RDD.scala:273)

Could you suggest why it throws this exception while MyClass is serializable by definition?

Best regards, Alexander

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message