kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prabhjot Bharaj <prabhbha...@gmail.com>
Subject Re: Notable failure scenarios in high-level/new consumer
Date Thu, 26 Nov 2015 17:58:01 GMT
Hi,

Request your expertise on these doubts of mine

Thanks,
Prabhjot

On Thu, Nov 26, 2015 at 12:09 PM, Prabhjot Bharaj <prabhbharaj@gmail.com>
wrote:

> Hello Folks,
>
> I am trying to build fault tolerance on the consumer side, so as to make
> sure that all failure scenarios are handled.
> On Data integrity side, there are primary 2 requirements:-
>
> 1. No Data loss
> 2. No data duplication
>
> I'm particularly interested in data duplication. e.g. there are various
> steps in the following order that will happen on the consumer during each
> consume cycle:-
>
> 1. connect
> 2. consume
> 3. write offset back to zookeeper/kafka (0.8/0.9)
> 4. process the message (which will be done by another code, not the
> consumer api)
>
> Please correct the above steps if I'm wrong
>
> Now, failures (machine down/process down/unhandled exceptions or bugs) can
> occur at each of the above steps
> Especially, if a failure occurs after consuming the message and before
> writing the offset back to zookeeper/kafka, on restart of the consumer, the
> same message could be reconsumed - leading to duplication of this message,
> if the 4th step is asynchronous.
> e.g. if processing the message happens before writing back the offset, it
> could cause data duplication after consumer restarts !
>
> Is this a valid scenario ?
> Also, are there any other scenarios that need to be taken into
> consideration when consuming ?
>
>
> Thanks,
> Prabhjot
>



-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message