spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iulian DragoČ™ <>
Subject Re: Spark Streaming: Doing operation in Receiver vs RDD
Date Thu, 08 Oct 2015 10:14:35 GMT
You can have a look at
for details on Receiver reliability. If you go the receiver way you'll need
to enable Write Ahead Logs to ensure no data loss. In Kafka direct you
don't have this problem.

Regarding where to apply decryption, I'd lean towards doing it as RDD
transformations for the reasons you mentioned. Also, in case only some
fields are encrypted, this way you can delay decryption until really need
(assuming some records would be filtered out, etc.).


On Wed, Oct 7, 2015 at 9:55 PM, emiretsk <> wrote:

> Hi,
> I have a Spark Streaming program that is consuming message from Kafka and
> has to decrypt and deserialize each message. I can implement it either as
> Kafka deserializer (that will run in a receiver or the new receiver-less
> Kafka consumer)  or as RDD operations. What are the pros/cons of each?
> As I see it, doing the operations on RDDs has the following implications
> Better load balancing, and fault tolerance. (though I'm not quite sure what
> happens when a receiver fails). Also, not sure if this is still true with
> the new Kafka receiver-less consumer as it creates an RDD partition for
> each
> Kafka partition
> All functions that are applied to RDDs need to be either static or part of
> serialzable objects. This makes using standard/3rd party Java libraries
> harder.
> Cheers,
> Eugene
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:


Iulian Dragos

Reactive Apps on the JVM

View raw message