spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Un-serializable 3rd-party classes (Spark, Java)
Date Wed, 18 Jun 2014 05:14:14 GMT
There are a few options:

- Kryo might be able to serialize these objects out of the box, depending what’s inside
them. Try turning it on as described at http://spark.apache.org/docs/latest/tuning.html.

- If that doesn’t work, you can create your own “wrapper” objects that implement Serializable,
or even a subclass of FlexCompRowMatrix. No need to change the original library.

- If the library has its own serialization functions, you could also use those inside a wrapper
object. Take a look at https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SerializableWritable.scala
for an example where we make Hadoop’s Writables serializable.

Matei

On Jun 17, 2014, at 10:11 PM, Daedalus <tushar.nagarajan@gmail.com> wrote:

> I'm trying to use  matrix-toolkit-java
> <https://github.com/fommil/matrix-toolkits-java/>   for an application of
> mine, particularly ,the FlexCompRowMatrix class (used to store sparse
> matrices).
> 
> I have a class Dataframe -- which contains and int array, two double values,
> and one FlexCompRowMatrix.
> 
> When I try and run a simple Spark .foreach() on a JavaRDD created using a
> list of the above mentioned Dataframes, I get the following errors:
> 
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due
> to s
> tage failure:* Task not serializable: java.io.NotSerializableException:
> no.uib.ci
> pr.matrix.sparse.FlexCompRowMatrix*
>        at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DA
> GScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
> 
> The FlexCompRowMatrix doesn't seem to implement Serializable. This class
> suits my purpose very well, and I would prefer not to switch over from it.
> 
> Other than writing code to make the class serializable, and then recompiling
> the matrix-toolkit-java source, what options do I have?
> 
> Is there any workaround for this issue?
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Un-serializable-3rd-party-classes-Spark-Java-tp7815.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.


Mime
View raw message