storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashok Gupta <gupta.ashok2...@gmail.com>
Subject Re: Question about OpaqueTridentKafkaSpout
Date Wed, 07 May 2014 00:19:03 GMT
I think it can. That is where the coordinator comes in picture. Coordinator
defines the parameters of a batch and emitters do the job of emitting the
sub portions of batch.





On Mon, May 5, 2014 at 12:50 PM, Abhishek Bhattacharjee <
abhishek.bhattacharjee11@gmail.com> wrote:

> Are you sure that a batch can consist of tuples from different partitions ?
> I am just asking I am not sure , if it can then your question seems to be
> valid else it is not valid anymore :-)
>
>
> On Fri, May 2, 2014 at 7:42 AM, Ashok Gupta <gupta.ashok2051@gmail.com>wrote:
>
>>
>> Hi,
>>
>>  I have theoretical question about the guarantees
>> OpaqueKafkaTridentKafkaSpout provides. I would like to take an example to
>> illustrate the question I have.
>>
>>  Suppose a batch with txId 10 has tuple t1, t2, t3, t4 and they
>> respectively come from the kafka partition p1,p2,p3,p4. When this batch is
>> played for the very first time it failed processing however the commit
>> happen for tuples t3 in the database while it did not happen for the tuples
>> t1,t2,t4. Since the batch failed, it is expected that the metadata in the
>> zookeeper is not going to be updated i.e. it will not assume the offsets as
>> committed for p1,p2,p3,p4. It is expected that the batch will be replayed,
>> however, suppose before it gets replayed the kafka partition p3 goes down.
>> What happens now? I understand that another batch with same transaction id
>> containing t1, t2, t4 may be replayed, however since p3 is down, t3 won’t
>> be replayed again. Since t3 is not replayed again, even if the batch
>> succeeds on replay the offsets for the p3 don’t get updated in the
>> zookeeper. That is all fine as long fault tolerance and opaque behavior is
>> concerned.
>>
>>  My concern is more around what happens when partition p3 is back up
>> again and the spout starts reading data from the last offset it committed
>> successfully. Since from partition p3, tuple t3 is again going to be read
>> and it is certainly going to be in a batch with some txId > 10 (say 19) it
>> is going to be applied in the state again. This apparently violates the
>> exactly once semantics.
>>
>>  Is the concern genuine or am I missing something?
>> Regards
>> --
>> Ashok Gupta,
>> (+1) 361-522-2172
>> San Jose, CA
>>
>
>
>
> --
> *Abhishek Bhattacharjee*
> *Pune Institute of Computer Technology*
>



-- 
Ashok Gupta,
(+1) 361-522-2172
San Jose, CA

Mime
View raw message