spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the new Kafka Consumer API
Date Sun, 27 Dec 2015 18:00:49 GMT
Should probably get everyone on the same page at

https://issues.apache.org/jira/browse/SPARK-12177

On Mon, Dec 21, 2015 at 5:33 AM, Mario Ds Briggs <mario.briggs@in.ibm.com>
wrote:

> Hi Cody,
>
>  I took a shot and here's what it looks like
>
> https://github.com/mariobriggs/spark/tree/kafka0.9-streaming/external/kafka-newconsumer/src/main/scala/org/apache/spark/streaming/kafka
>
> A few points of note
>    - I took the liberty of moving only the DirectStream & KafkaRDD part. I
> assume that's the right priority.
>
>   - In the API's on KafkaUtils, i didnt include the Deserializers in the
> signature. I have expected the user to set these in the input KafkaParams
> map. If that is a bad choice, open to putting them back.
>
>   - To support the getPreferredLocations on KafkaRDD, i pull-in the
> partitionLeader from the Consumer in KafkaRDD::getPartitions(). I could
> have kept it like earlier, but it meant that the KafkaRDD constructors take
> in 'leaders' argument.
>
> Quick look at API is here -
> https://github.com/mariobriggs/spark/blob/kafka0.9-streaming/external/kafka-newconsumer/README.md
>
> After doing this, i realized that since the signatures of
> createDirectStream & createRDD (Decoder's removed or replaced) change,
> these can be added without conflict to KafkaUtils and remain separate from
> existing implementation (older kafka API).
>
> thanks
> Mario
>
>
>
> ----- Original message -----
> From: Mario Ds Briggs/India/IBM
> To: Cody Koeninger <cody@koeninger.org>
> Cc: "dev@spark.apache.org" <dev@spark.apache.org>
> Subject: Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the
> new Kafka Consumer API
> Date: Mon, Dec 7, 2015 3:58 PM
>
>
> sounds sane for a first cut.
>
> Since all creation methods take a KafkaParams, i was thinking along lines
> of maybe  a temp property in there which trigger usage of new consumer.
>
> thanks
> Mario
>
> [image: Inactive hide details for Cody Koeninger ---04/12/2015 08:45:16
> pm---Brute force way to do it might be to just have a separate]Cody
> Koeninger ---04/12/2015 08:45:16 pm---Brute force way to do it might be to
> just have a separate streaming-kafka-new-consumer subproject, o
>
> From: Cody Koeninger <cody@koeninger.org>
> To: Mario Ds Briggs/India/IBM@IBMIN
> Cc: "dev@spark.apache.org" <dev@spark.apache.org>
> Date: 04/12/2015 08:45 pm
> Subject: Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the
> new Kafka Consumer API
> ------------------------------
>
>
>
> Brute force way to do it might be to just have a separate
> streaming-kafka-new-consumer subproject, or something along those lines.
>
> On Fri, Dec 4, 2015 at 3:12 AM, Mario Ds Briggs <*mario.briggs@in.ibm.com*
> <mario.briggs@in.ibm.com>> wrote:
>
>    - >>
>    forcing people on kafka 8.x to upgrade their brokers is questionable.
>    <<
>
>    I agree and i was more thinking maybe there is a way to support both
>    for a period of time (of course means some more code to maintain :-)).
>
>
>    thanks
>    Mario
>
>    [image: Inactive hide details for Cody Koeninger ---04/12/2015
>    12:15:55 am---Honestly my feeling on any new API is to wait for a point]Cody
>    Koeninger ---04/12/2015 12:15:55 am---Honestly my feeling on any new API is
>    to wait for a point release before taking it seriously :)
>
>    From: Cody Koeninger <*cody@koeninger.org* <cody@koeninger.org>>
>    To: Mario Ds Briggs/India/IBM@IBMIN
>    Cc: "*dev@spark.apache.org* <dev@spark.apache.org>" <
>    *dev@spark.apache.org* <dev@spark.apache.org>>
>    Date: 04/12/2015 12:15 am
>    Subject: Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using
>    the new Kafka Consumer API
>    ------------------------------
>
>
>
>    Honestly my feeling on any new API is to wait for a point release
>    before taking it seriously :)
>
>    Auth and encryption seem like the only compelling reason to move, but
>    forcing people on kafka 8.x to upgrade their brokers is questionable.
>
>    On Thu, Dec 3, 2015 at 11:30 AM, Mario Ds Briggs <
>    *mario.briggs@in.ibm.com* <mario.briggs@in.ibm.com>> wrote:
>       - Hi,
>
>       Wanted to pick Cody's mind on what he thinks about
>       DirectKafkaInputDStream/KafkaRDD internally using the new Kafka consumer
>       API. I know the latter is documented as beta-quality, but yet wanted to
>       know if he sees any blockers as to why shouldn't go there shortly. On my
>       side the consideration is that kafka 0.9.0.0 introduced Authentication and
>       Encryption (beta again) between clients & brokers, but this is available
>       only newer Consumer API's and not in the older Low-level/High-level API's.
>
>       From briefly studying the implementation of
>       DirectKafkaInputDStream/KafkaRDD and new Consumer API, my thinking is that
>       it is possible to support the exact current implementation you have using
>       the new API's.
>       One area that isnt so straightforward was the ctor of KafkaRDD
>       fixes the offsetRange (I did read about the deterministic feature you were
>       after) and i couldnt find a direct method in the new Consumer API to get
>       the current 'latest' offset - however one can do a consumer.seekToEnd() and
>       then call a consumer.position().
>       Of course one other benefit is that the new Consumer API's
>       abstracts away having to deal with finding the leader for a partition, so
>       can get rid of that code
>
>       Would be great to get your thoughts.
>
>       thanks in advance
>       Mario
>
>
>
>
>
>
>
>
>

Mime
View raw message