kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Dearman <tom.dear...@gmail.com>
Subject Re: Monitoring offset lag
Date Fri, 08 Jul 2016 15:57:57 GMT
I should mention this was using the web server to check status.

> On 8 Jul 2016, at 16:56, Tom Dearman <tom.dearman@gmail.com> wrote:
> Todd,
> Thanks for that I am taking a look.
> Is there a bug whereby if you only have a couple of messages on a topic, both with the
same key, that burrow doesn’t return correct info.  I was finding that http://localhost:8100/v2/kafka/betwave/consumer
<http://localhost:8100/v2/kafka/betwave/consumer> was returning a message with empty
consumers until I put on another message with a different key, i.e. a minimum of 2 partitions
with something in them.  I know this is not very like production, but on my local this I was
only testing with one user so get just one partition filled.
> Tom
>> On 6 Jul 2016, at 18:08, Todd Palino <tpalino@gmail.com <mailto:tpalino@gmail.com>>
>> Yeah, I've written dissertations at this point on why MaxLag is flawed. We
>> also used to use the offset checker tool, and later something similar that
>> was a little easier to slot into our monitoring systems. Problems with all
>> of these is why I wrote Burrow (https://github.com/linkedin/Burrow <https://github.com/linkedin/Burrow>)
>> For more details, you can also check out my blog post on the release:
>> https://engineering.linkedin.com/apache-kafka/burrow-kafka-consumer-monitoring-reinvented
>> -Todd
>> On Wednesday, July 6, 2016, Tom Dearman <tom.dearman@gmail.com> wrote:
>>> I recently had a problem on my production which I believe was a
>>> manifestation of the issue kafka-2978 (Topic partition is not sometimes
>>> consumed after rebalancing of consumer group), this is fixed in and
>>> we will upgrade our client soon.  However, it made me realise that I didn’t
>>> have any monitoring set up on this.  The only thing I can find as a metric
>>> is the
>>> kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.\w]+),
>>> which, if I understand correctly, is the max lag of any partition that that
>>> particular consumer is consuming.
>>> 1. If I had been monitoring this, and if my consumer was suffering from
>>> the issue in kafka-2978, would I actually have been alerted, i.e. since the
>>> consumer would think it is consuming correctly would it not have updated
>>> the metric.
>>> 2. There is another way to see offset lag using the command
>>> /usr/bin/kafka-consumer-groups --new-consumer --bootstrap-server
>>> --describe —group consumer_group_name and parsing the
>>> response.  Is it safe or advisable to do this?  I like the fact that it
>>> tells me each partition lag, although it is also not available if no
>>> consumer from the group is currently consuming.
>>> 3. Is there a better way of doing this?
>> -- 
>> *Todd Palino*
>> Staff Site Reliability Engineer
>> Data Infrastructure Streaming
>> linkedin.com/in/toddpalino

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message