spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mario Ds Briggs" <mario.bri...@in.ibm.com>
Subject Re: Spark streaming Kafka receiver WriteAheadLog question
Date Tue, 26 Apr 2016 11:09:32 GMT
That was my initial thought as well. But then i was wondering if this
approach could help remove
 a - the little extra latency overhead we have with the DirectApproach
(compared to Receiver) and
 b - the data duplication in-efficiency (replication to WAL) and single
version of the truth of the offsets processed (under some failures) in the
Receiver approach.

thanks
Mario

----- Message from Cody Koeninger <cody@koeninger.org> on Mon, 25 Apr 2016
09:23:32 -0500 -----
                                                         
      To: Renyi Xiong <renyixiong0@gmail.com>            
                                                         
      cc: dev <dev@spark.apache.org>                     
                                                         
 Subject: Re: Spark streaming Kafka receiver             
          WriteAheadLog question                         
                                                         

If you want to refer back to Kafka based on offset ranges, why not use
createDirectStream?

On Fri, Apr 22, 2016 at 11:49 PM, Renyi Xiong <renyixiong0@gmail.com>
wrote:
> Hi,
>
> Is it possible for Kafka receiver generated WriteAheadLogBackedBlockRDD
to
> hold corresponded Kafka offset range so that during recovery the RDD can
> refer back to Kafka queue instead of paying the cost of write ahead log?
>
> I guess there must be a reason here. Could anyone please help me
understand?
>
> Thanks,
> Renyi.





Mime
View raw message