spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shao, Saisai" <>
Subject RE: Low Level Kafka Consumer for Spark
Date Wed, 03 Dec 2014 01:34:24 GMT
Hi Rod,

The purpose of introducing  WAL mechanism in Spark Streaming as a general solution is to make
all the receivers be benefit from this mechanism. 

Though as you said, external sources like Kafka have their own checkpoint mechanism, instead
of storing data in WAL, we can only store metadata to WAL, and recover from the last committed
offsets. But this requires sophisticated design of Kafka receiver with low-level API involved,
also we need to take care of rebalance and fault tolerance things by ourselves. So right now
instead of implementing a whole new receiver, we choose to implement a simple one, though
the performance is not so good, it's much easier to understand and maintain.

The design purpose and implementation of reliable Kafka receiver can be found in (
And in future, to improve the reliable Kafka receiver like what you mentioned is on our scheduler.


-----Original Message-----
From: RodrigoB [] 
Sent: Wednesday, December 3, 2014 5:44 AM
Subject: Re: Low Level Kafka Consumer for Spark


Just to make sure I will not be misunderstood - My concerns are referring to the Spark upcoming
solution and not yours. I would to gather the perspective of someone which implemented recovery
with Kafka a different way.


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail: For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message