spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Chen (JIRA)" <>
Subject [jira] [Commented] (SPARK-9523) Receiver for Spark Streaming does not naturally support kryo serializer
Date Sun, 30 Aug 2015 02:36:45 GMT


John Chen commented on SPARK-9523:

The problem here is not about warn/error/etc, it's that if you want to do something special
for transient attributes, you'll have to write your own code. However, for Java and Kryo serializations,
the code you write is different.

For Java, you need to write those code in readObject() and writeObject() methods, as for Kryo,
you'll have to write those codes in another pair of methods: read() and write(). So if you
want to support both Java and Kryo serializations with transient attributes and customized
serialization operations, you need to write all 4 methods in your class.

For other DStream functions, you do not care about Kryo, as they seems to only support Java
serialization, so even if you set the KryoSerializer in SparkConf, the serialization is still
done by java . However, for the Receiver in Spark Streaming, if will be serialized by Kryo
if you set so, and the real issue here is that THE RECEIVER AND OTHER FUNCTIONS DO NOT ACT
THE SAME, which can be confusing for new developers.

> Receiver for Spark Streaming does not naturally support kryo serializer
> -----------------------------------------------------------------------
>                 Key: SPARK-9523
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.3.1
>         Environment: Windows 7 local mode
>            Reporter: John Chen
>            Priority: Minor
>              Labels: kryo, serialization
>   Original Estimate: 120h
>  Remaining Estimate: 120h
> In some cases, some attributes in a class is not serializable, which you still want to
use after serialization of the whole object, you'll have to customize your serialization codes.
For example, you can declare those attributes as transient, which makes them ignored during
serialization, and then you can reassign their values during deserialization.
> Now, if you're using Java serialization, you'll have to implement Serializable, and write
those codes in readObject() and writeObejct() methods; And if you're using kryo serialization,
you'll have to implement KryoSerializable, and write these codes in read() and write() methods.
> In Spark and Spark Streaming, you can set kryo as the serializer for speeding up. However,
the functions taken by RDD or DStream operations are still serialized by Java serialization,
which means you only need to write those custom serialization codes in readObject() and writeObejct()
> But when it comes to Spark Streaming's Receiver, things are different. When you wish
to customize an InputDStream, you must extend the Receiver. However, it turns out, the Receiver
will be serialized by kryo if you set kryo serializer in SparkConf, and will fall back to
Java serialization if you didn't.
> So here's comes the problems, if you want to change the serializer by configuration and
make sure the Receiver runs perfectly for both Java and kryo, you'll have to write all the
4 methods above. First, it is redundant, since you'll have to write serialization/deserialization
code almost twice; Secondly, there's nothing in the doc or in the code to inform users to
implement the KryoSerializable interface. 
> Since all other function parameters are serialized by Java only, I suggest you also make
it so for the Receiver. It may be slower, but since the serialization will only be executed
for each interval, it's durable. More importantly, it can cause fewer trouble

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message