kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Parthasarathy, Mohan" <mpart...@hpe.com>
Subject Re: Kafka streams in Kubernetes
Date Sun, 09 Jun 2019 21:59:14 GMT
Matt,

Thanks for your response. I agree with you that there is no easy way to answer this. I was
trying to see what others experience is which could simply be "Don't bother, in practice stateful
set is better".

Could you explain as to why there has to be more state than the window size ? In a running
application, as the data is being processed from a topic, there is state being created depending
on the stateful primitives. As the window is closed, this state is not needed and I can see
why grace period has to be taken into account ?  So, when would you  need it for the whole
store retention time ? Could you clarify ?

Thanks
Mohan

´╗┐On 6/8/19, 11:18 PM, "Matthias J. Sax" <matthias@confluent.io> wrote:

    If depends how much state you need to restore and how much restore-time
    you can accept in your application.
    
    The amount of data that needs to be restored, does not depend on the
    window-size, but the store retention time (default 1 day, configurable
    via `Materialized#withRetention()`). The window size (and grace period,
    if case you use one) is a lower bound for the configurable retention
    time though, ie, retention time >= grace-period >= window-size.
    
    What you also need to take into account is, how often topics are
    compacted, and how large the segment size is, because the active segment
    is not subject to compaction.
    
    It's always hard to answer a question like this. I would recommend to do
    some testing and benchmark fail over, by manually killing some instances
    to simulate a crash. This should give the best insight -- tuning the
    above parameters, you can see, what works for your application.
    
    
    
    -Matthias
    
    On 6/8/19 4:28 PM, Pavel Sapozhnikov wrote:
    > I suggest take a look at Strimzi project https://strimzi.io/
    > 
    > Kafka operator deployed in Kubernetes environment.
    > 
    > On Sat, Jun 8, 2019, 6:09 PM Parthasarathy, Mohan <mparthas@hpe.com> wrote:
    > 
    >> Hi,
    >>
    >> I have read several articles about this topic. We are soon going to deploy
    >> our streaming apps inside k8s. My understanding from reading these articles
    >> is that stateful set in k8s is not mandatory as the application can rebuild
    >> its state if the state store is not present. Can people share their
    >> experience or recommendation when it comes to deploying the streaming apps
    >> on k8s ?
    >>
    >> Also, let us say the application is using a tumbling window of 5 mts. When
    >> an application restarts, is it correct to say that it has to re-build the
    >> state only for that 5 minute window for the partitions that it was handling
    >> before. I had an instance of such a restart where it was running a long
    >> time in REBALANCE which makes me think that my understanding is incorrect.
    >> In this case, the state store was available during the restart. Can someone
    >> clarify ?
    >>
    >> Thanks
    >> Mohan
    >>
    >>
    > 
    
    

Mime
View raw message