Thanks,Hi Sean,Thanks for the reply. I've done the part of forcing registration of classes to the kryo serializer. The observation is in that scenario. To give a sense of the data, they are records which are serialized using thrift and read from the Kinesis stream. The data itself is deserialized only inside the rdd.foreach(), so Spark transfers only Array[Byte] which is a common kryo serialiable type.
It depends a lot on your data. If it's a lot of custom types then Kryo doesn't have a lot of advantage, although, you want to make sure to register all your classes with kryo (and consider setting the flag that requires kryo registration to ensure it) because that can let kryo avoid writing a bunch of class names, which Java serialization always would.On Thu, Oct 6, 2016 at 2:47 PM Rajkiran Rajkumar <firstname.lastname@example.org> wrote:Hi,I am running a Spark Streaming application which reads from a Kinesis stream and processes data. The application is run on EMR. Recently, we tried moving from Java's inbuilt serializer to Kryo serializer. To quantify the performance improvement, I tried pumping 30000 input records to the application over a period of 5 minutes. Based on the task deserialization time, I have the following data.Using Java serializer- Median 3 ms, Mean 8.21 msUsing Kryo serializer- Median 4 ms, Mean 9.64 ms
Here, we see that Kryo serializer is slower than Java serializer. Looking for some advice regarding items that I might have missed taking into account. Please let me know if more information is needed.Thanks,Rajkiran