spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mario Ds Briggs" <>
Subject Re: Spark streaming Kafka receiver WriteAheadLog question
Date Tue, 26 Apr 2016 11:09:32 GMT
That was my initial thought as well. But then i was wondering if this
approach could help remove
 a - the little extra latency overhead we have with the DirectApproach
(compared to Receiver) and
 b - the data duplication in-efficiency (replication to WAL) and single
version of the truth of the offsets processed (under some failures) in the
Receiver approach.


----- Message from Cody Koeninger <> on Mon, 25 Apr 2016
09:23:32 -0500 -----
      To: Renyi Xiong <>            
      cc: dev <>                     
 Subject: Re: Spark streaming Kafka receiver             
          WriteAheadLog question                         

If you want to refer back to Kafka based on offset ranges, why not use

On Fri, Apr 22, 2016 at 11:49 PM, Renyi Xiong <>
> Hi,
> Is it possible for Kafka receiver generated WriteAheadLogBackedBlockRDD
> hold corresponded Kafka offset range so that during recovery the RDD can
> refer back to Kafka queue instead of paying the cost of write ahead log?
> I guess there must be a reason here. Could anyone please help me
> Thanks,
> Renyi.

View raw message