kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eno Thereska <eno.there...@gmail.com>
Subject Re: Understanding the topology of high level kafka stream
Date Wed, 09 Nov 2016 14:16:24 GMT
Hi Sachin,

Kafka Streams is built on top of standard Kafka consumers. For for every topic it consumes
from (whether changelog topic or source topic, it doesn't matter), the consumer stores the
offset it last consumed from. Upon restart, by default it start consuming from where it left
off from each of the topics. So you can think of it this way: a restart should be no different
than if you had left the application running (i.e., no restart). 


> On 9 Nov 2016, at 13:59, Sachin Mittal <sjmittal@gmail.com> wrote:
> Hi,
> I had some basic questions on sequence of tasks for streaming application
> restart in case of failure or otherwise.
> Say my stream is structured this way
> source-topic
>   branched into 2 kstreams
>    source-topic-1
>    source-topic-2
>   each mapped to 2 new kstreams (new key,value pairs) backed by 2 kafka
> topics
>       source-topic-1-new
>       source-topic-2-new
>       each aggregated to new ktable backed by internal changelog topics
>       source-topic-1-new-table (scource-topic-1-new-changelog)
>       source-topic-2-new-table (scource-topic-2-new-changelog)
>       table1 left join table2 -> to final stream
> Results of final stream are then persisted into another data storage
> So if you see I have following physical topics or state stores
> source-topic
> source-topic-1-new
> source-topic-2-new
> scource-topic-1-new-changelog
> scource-topic-2-new-changelog
> Now at a give point if the streaming application is stopped there is some
> data in all these topics.
> Barring the source-topic all other topic has data inserted by the streaming
> application.
> Also I suppose streaming application stores the offset for each of the
> topic as where it was last.
> So when I restart the application how does the processing starts again?
> Will it pick the data from last left changelog topics and process them
> first and then process the source topic data from the offset last left?
> Or it will start from source topic. I really don't want it to maintain
> offset to changelog tables because any old key's value can be modified as
> part of aggregation again.
> Bit confused here, any light would help a lot.
> Thanks
> Sachin

View raw message