spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neelesh <neele...@gmail.com>
Subject Re: Latest enhancement in Low Level Receiver based Kafka Consumer
Date Wed, 01 Apr 2015 17:37:45 GMT
Hi Dibyendu,
   Thanks for your work on this project. Spark 1.3 now has direct kafka
streams, but still does not provide enough control over partitions and
topics. For example, the streams are fairly statically configured -
RDD.getPartitions() is computed only once, thus making it difficult to use
in a SaaS environment where topics are created and deactivated on the fly
(one topic per customer, for example). But its easy to build a wrapper
around your receivers.
May be there is a play where one can club direct streams with your
receivers, but I don't quite fully understand how the 1.3 direct streams
work yet

Another thread -  Kafka 0.8.2 supports non ZK offset management , which I
think is more scalable than bombarding ZK. I'm working on supporting the
new offset management strategy for Kafka with kafka-spark-consumer.

Thanks!
-neelesh

On Wed, Apr 1, 2015 at 9:49 AM, Dibyendu Bhattacharya <
dibyendu.bhattachary@gmail.com> wrote:

> Hi,
>
> Just to let you know, I have made some enhancement in Low Level Reliable
> Receiver based Kafka Consumer (
> http://spark-packages.org/package/dibbhatt/kafka-spark-consumer)  .
>
> Earlier version uses as many Receiver task for number of partitions of
> your kafka topic . Now you can configure desired number of Receivers task
> and every Receiver can handle subset of topic partitions.
>
> There was some use cases where consumer need to handle gigantic topics (
> having 100+ partitions ) and using my receiver creates that many Receiver
> task and hence that many CPU cores is needed just for Receiver. It was a
> issue .
>
>
> In latest code, I have changed that behavior. The max limit for number of
> Receiver is still your number of partition, but if you specify less number
> of Receiver task, every receiver will handle a subset of partitions and
> consume using Kafka Low Level consumer API.
>
> Every receiver will manages partition(s) offset in ZK as usual way..
>
>
> You can see the latest consumer here :
> http://spark-packages.org/package/dibbhatt/kafka-spark-consumer
>
>
>
> Regards,
> Dibyendu
>
>

Mime
View raw message