kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse Hodges <hodges.je...@gmail.com>
Subject Re: Processing time series data in order
Date Thu, 22 Dec 2016 02:09:52 GMT
Depending on the expected max out of order window, why not order them in memory? Then you don't
need to reread from Cassandra, in case of a problem you can reread data from Kafka. 

-Jesse 

> On Dec 21, 2016, at 7:24 PM, Ali Akhtar <ali.rac200@gmail.com> wrote:
> 
> - I'm receiving a batch of messages to a Kafka topic.
> 
> Each message has a timestamp, however the messages can arrive / get processed out of
order. I.e event 1's timestamp could've been a few seconds before event 2, and event 2 could
still get processed before event 1.
> 
> - I know the number of messages that are sent per batch.
> 
> - I need to process the messages in order. The messages are basically providing the history
of an item. I need to be able to track the history accurately (i.e, if an event occurred 3
times, i need to accurately log the dates of the first, 2nd, and 3rd time it occurred).
> 
> The approach I'm considering is:
> 
> - Creating a cassandra table which is ordered by the timestamp of the messages.
> 
> - Once a batch of messages has arrived, writing them all to cassandra, counting on them
being ordered by the timestamp even if they are processed out of order.
> 
> - Then iterating over the messages in the cassandra table, to process them in order.
> 
> However, I'm concerned about Cassandra's eventual consistency. Could it be that even
though I wrote the messages, they are not there when I try to read them (which would be almost
immediately after they are written)?
> 
> Should I enforce consistency = ALL to make sure the messages will be available immediately
after being written?
> 
> Is there a better way to handle this thru either Kafka streams or Cassandra?

Mime
View raw message