spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Chen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-9523) Receiver for Spark Streaming does not naturally support kryo serializer
Date Sat, 01 Aug 2015 10:03:04 GMT
John Chen created SPARK-9523:
--------------------------------

             Summary: Receiver for Spark Streaming does not naturally support kryo serializer
                 Key: SPARK-9523
                 URL: https://issues.apache.org/jira/browse/SPARK-9523
             Project: Spark
          Issue Type: Improvement
          Components: Streaming
    Affects Versions: 1.3.0
         Environment: Windows 7 local mode
            Reporter: John Chen
             Fix For: 1.3.2, 1.4.2


In some cases, some attributes in a class is not serializable, which you still want to use
after serialization of the whole object, you'll have to customize your serialization codes.
For example, you can declare those attributes as transient, which makes them ignored during
serialization, and then you can reassign their values during deserialization.

Now, if you're using Java serialization, you'll have to implement Serializable, and write
those codes in readObject() and writeObejct() methods; And if you're using kryo serialization,
you'll have to implement KryoSerializable, and write these codes in read() and write() methods.

In Spark and Spark Streaming, you can set kryo as the serializer for speeding up. However,
the functions taken by RDD or DStream operations are still serialized by Java serialization,
which means you only need to write those custom serialization codes in readObject() and writeObejct()
methods.

But when it comes to Spark Streaming's Receiver, things are different. When you wish to customize
an InputDStream, you must extend the Receiver. However, it turns out, the Receiver will be
serialized by kryo if you set kryo serializer in SparkConf, and will fall back to Java serialization
if you didn't.

So here's comes the problems, if you want to change the serializer by configuration and make
sure the Receiver runs perfectly for both Java and kryo, you'll have to write all the 4 methods
above. First, it is redundant, since you'll have to write serialization/deserialization code
almost twice; Secondly, there's nothing in the doc or in the code to inform users to implement
the KryoSerializable interface. 

Since all other function parameters are serialized by Java only, I suggest you also make it
so for the Receiver. It may be slower, but since the serialization will only be executed for
each interval, it's durable. More importantly, it can cause fewer trouble



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message