kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bogdan Dimitriu (bdimitri)" <bdimi...@cisco.com>
Subject Re: Getting the KafkaStream ID
Date Wed, 11 Jun 2014 10:09:24 GMT
Which JMX MBeans are you referring to, Jun? I couldn’t find anything that
gives me the same information as the ConsumerOffsetChecker tool.
In any case, my main problem is that I don’t know when I should slow down
the iteration because I don’t know which stream the iteration is
consuming. I have the global partition lag situation (from the
ConsumerOffsetChecker), but I don’t know how to apply it to slow down the
iterators because I can’t identify them.

Thanks,
Bogdan

On 09/06/2014 15:28, "Jun Rao" <junrao@gmail.com> wrote:

>We do have a jmx that reports the lag per partition. You could probably
>get
>the lag that way. Then, you just need to slow down the iteration on the
>fast partition.
>
>Thanks,
>
>Jun
>
>
>On Mon, Jun 9, 2014 at 4:07 AM, Bogdan Dimitriu (bdimitri) <
>bdimitri@cisco.com> wrote:
>
>> Certainly.
>> I know this may not sound like a great idea but I am running out of
>> options here: I¹m basically trying to implement a consumer throttle. My
>> application consumes from a fairly high number of partitions from a
>>number
>> of consumer servers. The data is put in the partitions by a producer in
>>a
>> round robin fashion so the number of messages each partition is given is
>> even. The messages have a time component assigned to them.
>> Now, for a good majority of time the consumers will be faster than the
>> producers, so the lags I get with the ConsumerOffsetChecker are mostly 0
>> (or 1) and this works well with the time component because once they are
>> consumed there is a logical grouping of messages from all partitions
>>based
>> on the time component (coarsely).
>> The point where I¹m starting to get into trouble is when the consumers
>>are
>> all stopped for a while and messages start to accumulate in the
>>partitions
>> without being consumed (hence the lag increases). Once I resume the
>> consumers (with 1 thread per each partition) messages start to get
>> consumed very fast, but because some messages take longer to process
>>than
>> others, over time the lags start to get very uneven between partitions
>>and
>> this starts to interfere with the grouping by the time component.
>> So the only way I thought I could prevent this from happening was by
>> throttling the ³fast² consumers, which have a smaller lag. This will
>>only
>> happen rarely, so I thought I could live with the approach. To do that,
>>I
>> obviously need to get the lag information periodically. I do that by a
>> mechanism similar to what ConsumerOffsetChecker does. But now I need to
>> know from which thread to call sleep() and there seems to be no decent
>>way
>> to find that out.
>> Any better ideas would be highly appreciated.
>>
>> Thanks,
>> Bogdan
>>
>> On 08/06/2014 06:25, "Jun Rao" <junrao@gmail.com> wrote:
>>
>> >Could you elaborate on the use case of the stream ID?
>> >
>> >Thanks,
>> >
>> >Jun
>> >
>> >
>> >On Fri, Jun 6, 2014 at 2:13 AM, Bogdan Dimitriu (bdimitri) <
>> >bdimitri@cisco.com> wrote:
>> >
>> >> Hello folks,
>> >>
>> >> I¹m using Kafka 0.8.0 with the high level consumer and I have a
>> >>situation
>> >> where I need to obtain the ID for each of the KafkaStreams that I
>> >>create.
>> >> The KafkaStream class has a method called ³clientId()² that I
>>expected
>> >> would give me just that, but unfortunately it returns the name of the
>> >> consumer group.
>> >> So to make it clear, what I want to obtain is the string that looks
>>like
>> >> this: myconsumergroup_myhost-1402045464004-2dc0cbf2-0.
>> >> Is there any way I could get that value for each of the streams? I¹ve
>> >> looked around the source code but I can¹t see any way to do this.
>> >>
>> >> Many thanks,
>> >> Bogdan
>> >>
>> >>
>>
>>

Mime
View raw message