kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximiliano Patricio Méndez <mmen...@despegar.com>
Subject Re: Connection to kafka stalls
Date Mon, 15 Feb 2016 17:04:42 GMT
Another update.

The problem appeared again. The consumer is stalling at certain offsets.

Anyone has an idea of what can be happening?

Anything that I could add that might help, let me know

2016-02-12 10:29 GMT-03:00 Maximiliano Patricio Méndez <mmendez@despegar.com
>:

> Hi,
>
> An update about this.
>
> I've recreated the topic with different configuration and the problem
> doesn't seem to be happening anymore. I have 8 brokers. This topic was
> previously created (when the connections were stalling within the thread
> dump that I've attached) with 5 partitions, retetion policy of 5 days and
> replication factor of 2. The new configuration (which no longer causes this
> issue) has 8 partitions, same retention policy and same replication factor.
>
> What could have caused the connections to hang in the previous
> configuration?
>
> 2016-02-10 15:19 GMT-03:00 Maximiliano Patricio Méndez <
> mmendez@despegar.com>:
>
>> Sorry, I'm using kafka 0.8.2 and a ConsumerGroup similar to what it is in
>> the documentation.
>>
>> 2016-02-10 14:39 GMT-03:00 Maximiliano Patricio Méndez <
>> mmendez@despegar.com>:
>>
>>> Hi,
>>>
>>> I'm having trouble with some recurring stalling connections to kafka.
>>> What I see as a symptom is that some consumers lag behind and most times
>>> restarting the consumer doesn't help. (occasionally when some other
>>> consumer tries to take the problematic partition it no longer fails, but
>>> mostly even when it switches consumer it stalls shortly after).
>>>
>>> Doing a thread dump of this situation I see that the call stalls in the
>>> hasNext() method of the ConsumerIterator, although it has many messages to
>>> consume and that particular partition for that topic is lagged.
>>>
>>> "hermes-consumer-thread-1" #75 prio=5 os_prio=0 tid=0x00007fe430fde000
>>> nid=0x7c01 waiting on condition [0x00007fe428ce1000]
>>>    java.lang.Thread.State: TIMED_WAITING (parking)
>>>         at sun.misc.Unsafe.park(Native Method)
>>>         - parking to wait for  <0x000000070932c870> (a
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>>         at
>>> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>>>         at
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>>>         at
>>> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
>>>         at
>>> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:65)
>>>         at
>>> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
>>>         at
>>> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
>>>         at
>>> kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
>>>
>>> Reading through the mailing list I've come accross old solutions for
>>> this problem, including checking the consumer.timeout.ms (which i've
>>> added with no results) and checking the size of the messages (if the
>>> message is bigger than fetch.message.max.bytes it will stop like this) but
>>> my messages are all under 300 bytes in size.
>>>
>>> Have anyone had this problem? Any help would be appreciated
>>>
>>> Thanks
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message