kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From R Krishna <krishna...@gmail.com>
Subject Re: Kafka rebalancing message lost
Date Tue, 18 Dec 2018 22:55:42 GMT
For very large number of consumers, you can manually manage the offsets
and/or assign partitions yourself per consumer to avoid rebalancing.


On Dec 18, 2018 9:58 AM, "Ryanne Dolan" <ryannedolan@gmail.com> wrote:

> Parth, I am skeptical that you actually need 500+ consumers. A well tuned
> consumer can process hundreds of thousands of records per second.
>
> Some notes to consider:
>
> - You'll need at least 500 partitions if you have 500 consumers.
> - You almost never need more consumers than you have brokers in your
> cluster. If you can store N bps to disk on a broker, you can usually
> process at least N bps in a consumer.
> - If your consumers can't process fast enough, add parallelism within each
> consumer, e.g. process records asynchronously. You don't necessarily need
> more consumers.
> - It might not make sense to auto-scale consumers, since scaling up
> triggers a rebalance, which can cause even more consumer lag on bursty
> streams.
> - Unless you have a real-time use case, you can generally under-provision
> your consumers and let them catch up with bursts over time.
>
> For example, I've processed 2 TB of records with 10 consumers in about 15
> minutes in stress tests, and I generally have provisioned one 64GB server
> for every 20K records/s sustained.
>
> This of course varies wildly depending on your use case, but I just want to
> call out that you don't necessarily need a lot of consumers to process huge
> amounts of data.
>
> Ryanne
>
> On Dec 18, 2018 10:25 AM, "Manoj Khangaonkar" <khangaonkar@gmail.com>
> wrote:
>
> Rebalancing of partitions consumers does not necessarily mean loss of
> message.
>
> But I understand it can be annoying.
>
> If Kafka is rebalancing between consumers frequently, It means your
> consumer code is not polling within the expected timeout, as a result of
> which
> Kafka thinks the consumer is gone. You should tune your consumer
> implementation to keep the polling loop duration reasonable. See
> heartbeat.interval and session.timeout.ms
> configuration params in documentation.
>
> regards
>
>
>
> On Tue, Dec 18, 2018 at 3:34 AM Parth Gandhi <
> parth.gandhi@excellenceinfonet.com> wrote:
>
> > Team,
> > We want to build a scalable kafka system for pub sub message and want to
> > run consumers (500+) on docker. We want the system to scale up the
> consumer
> > based on the message inflow. However in kafka this triggers a rebalancing
> > and we fear loss of message.
> > What is the best practices/way to achieve this with no or least message
> > failure?
> >
> > Disclaimer
> >
> > The information contained in this communication from the sender is
> > confidential. It is intended solely for use by the recipient and others
> > authorized to receive it. If you are not the recipient, you are hereby
> > notified that any disclosure, copying, distribution or taking action in
> > relation of the contents of this information is strictly prohibited and
> may
> > be unlawful.
> >
> > This email has been scanned for viruses and malware, and may have been
> > automatically archived by Mimecast Ltd, an innovator in Software as a
> > Service (SaaS) for business. Providing a safer and more useful place for
> > your human generated data. Specializing in; Security, archiving and
> > compliance. To find out more visit the Mimecast website.
> >
>
>
> --
> http://khangaonkar.blogspot.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message