spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-9523) Receiver for Spark Streaming does not naturally support kryo serializer
Date Sun, 02 Aug 2015 08:26:04 GMT

     [ https://issues.apache.org/jira/browse/SPARK-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen updated SPARK-9523:
-----------------------------
    Target Version/s:   (was: 1.3.1)
            Priority: Minor  (was: Major)
       Fix Version/s:     (was: 1.4.2)
                          (was: 1.3.2)

[~fish748] Please read https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
 The fields on this JIRA can't be right... 1.3.1 was released. Fix version doesn't apply to
unresolved JIRAs. etc.

> Receiver for Spark Streaming does not naturally support kryo serializer
> -----------------------------------------------------------------------
>
>                 Key: SPARK-9523
>                 URL: https://issues.apache.org/jira/browse/SPARK-9523
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.3.1
>         Environment: Windows 7 local mode
>            Reporter: John Chen
>            Priority: Minor
>              Labels: kryo, serialization
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> In some cases, some attributes in a class is not serializable, which you still want to
use after serialization of the whole object, you'll have to customize your serialization codes.
For example, you can declare those attributes as transient, which makes them ignored during
serialization, and then you can reassign their values during deserialization.
> Now, if you're using Java serialization, you'll have to implement Serializable, and write
those codes in readObject() and writeObejct() methods; And if you're using kryo serialization,
you'll have to implement KryoSerializable, and write these codes in read() and write() methods.
> In Spark and Spark Streaming, you can set kryo as the serializer for speeding up. However,
the functions taken by RDD or DStream operations are still serialized by Java serialization,
which means you only need to write those custom serialization codes in readObject() and writeObejct()
methods.
> But when it comes to Spark Streaming's Receiver, things are different. When you wish
to customize an InputDStream, you must extend the Receiver. However, it turns out, the Receiver
will be serialized by kryo if you set kryo serializer in SparkConf, and will fall back to
Java serialization if you didn't.
> So here's comes the problems, if you want to change the serializer by configuration and
make sure the Receiver runs perfectly for both Java and kryo, you'll have to write all the
4 methods above. First, it is redundant, since you'll have to write serialization/deserialization
code almost twice; Secondly, there's nothing in the doc or in the code to inform users to
implement the KryoSerializable interface. 
> Since all other function parameters are serialized by Java only, I suggest you also make
it so for the Receiver. It may be slower, but since the serialization will only be executed
for each interval, it's durable. More importantly, it can cause fewer trouble



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message