spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: Low Level Kafka Consumer for Spark
Date Mon, 04 Aug 2014 03:59:03 GMT
I'll let TD chime on on this one, but I'm guessing this would be a welcome
addition. It's great to see community effort on adding new
streams/receivers, adding a Java API for receivers was something we did
specifically to allow this :)

- Patrick


On Sat, Aug 2, 2014 at 10:09 AM, Dibyendu Bhattacharya <
dibyendu.bhattachary@gmail.com> wrote:

> Hi,
>
> I have implemented a Low Level Kafka Consumer for Spark Streaming using
> Kafka Simple Consumer API. This API will give better control over the Kafka
> offset management and recovery from failures. As the present Spark
> KafkaUtils uses HighLevel Kafka Consumer API, I wanted to have a better
> control over the offset management which is not possible in Kafka HighLevel
> consumer.
>
> This Project is available in below Repo :
>
> https://github.com/dibbhatt/kafka-spark-consumer
>
>
> I have implemented a Custom Receiver consumer.kafka.client.KafkaReceiver.
> The KafkaReceiver uses low level Kafka Consumer API (implemented in
> consumer.kafka packages) to fetch messages from Kafka and 'store' it in
> Spark.
>
> The logic will detect number of partitions for a topic and spawn that many
> threads (Individual instances of Consumers). Kafka Consumer uses Zookeeper
> for storing the latest offset for individual partitions, which will help to
> recover in case of failure. The Kafka Consumer logic is tolerant to ZK
> Failures, Kafka Leader of Partition changes, Kafka broker failures,
>  recovery from offset errors and other fail-over aspects.
>
> The consumer.kafka.client.Consumer is the sample Consumer which uses this
> Kafka Receivers to generate DStreams from Kafka and apply a Output
> operation for every messages of the RDD.
>
> We are planning to use this Kafka Spark Consumer to perform Near Real Time
> Indexing of Kafka Messages to target Search Cluster and also Near Real Time
> Aggregation using target NoSQL storage.
>
> Kindly let me know your view. Also if this looks good, can I contribute to
> Spark Streaming project.
>
> Regards,
> Dibyendu
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message