kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathias Söderberg <mathias.soederb...@gmail.com>
Subject Re: Monitoring of consumer group lag
Date Tue, 17 Mar 2015 09:36:49 GMT
Hi Lance,

I tried Kafka Offset Monitor a while back, but it didn't play especially
nice with a lot of topics / partitions (we currently have around 1400
topics and 4000 partitions in total). Might be possible to make it work a
bit better, but not sure it would be the best way to do alerting.

Thanks for the tip though :).

Best regards,
Mathias


On Mon, 16 Mar 2015 at 21:02 Lance Laursen <llaursen@rubiconproject.com>
wrote:

> Hey Mathias,
>
> Kafka Offset Monitor will give you a general idea of where your consumer
> group(s) are at:
>
> http://quantifind.com/KafkaOffsetMonitor/
>
> However, I'm not sure how useful it will be with "a large number of topics"
> / turning its output into a script that alerts upon a threshold. Could take
> a look and see what they're doing though.
>
> On Mon, Mar 16, 2015 at 8:31 AM, Mathias Söderberg <
> mathias.soederberg@gmail.com> wrote:
>
> > Good day,
> >
> > I'm looking into using SimpleConsumer#getOffsetsBefore and offsets
> > committed in ZooKeeper for monitoring the lag of a consumer group.
> >
> > Our current use case is that we have a service that is continuously
> > consuming messages of a large number of topics and persisting the
> messages
> > to S3 at somewhat regular intervals (depends on time and the total size
> of
> > consumed messages for each partition). Offsets are committed to ZooKeeper
> > after the messages have been persisted to S3.
> > The partitions are of varying load, so a simple threshold based on the
> > number of messages we're lagging behind would be cumbersome to maintain
> due
> > to the number of topics, and most likely prone to unnecessary alerts.
> >
> > Currently our broker configuration specifies log.roll.hours=1 and
> > log.segment.bytes=1GB, and my proposed solution is to have a separate
> > service that would iterate through all topics/partitions and use
> > #getOffsetsBefore with a timestamp that is one (1) or two (2) hours ago
> and
> > compare the first offset (which from my testing looks to be the offset
> that
> > is closest in time, i.e. from the log segment that is closest to the
> > timestamp given) with the one that is saved to ZooKeeper.
> > It feels like a pretty solid solution, given that we just want a rough
> > estimate of how much we're lagging behind in time, so that we know
> (again,
> > roughly) how much time we have to fix whatever is broken before the log
> > segments are deleted by Kafka.
> >
> > Is there anyone doing monitoring similar to this? Are there any obvious
> > downsides of this approach that I'm not thinking about? Thoughts on
> > alternatives?
> >
> > Best regards,
> > Mathias
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message