kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hans Jespersen <h...@confluent.io>
Subject Re: Deadlock using latest 0.10.1 Kafka release
Date Thu, 03 Nov 2016 18:42:52 GMT
The 0.10.1 broker will use more file descriptor than previous releases
because of the new timestamp indexes. You should expect and plan for ~33%
more file descriptors to be open.

-hans

/**
 * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
 * hans@confluent.io (650)924-2670
 */

On Thu, Nov 3, 2016 at 10:02 AM, Marcos Juarez <mjuarez@gmail.com> wrote:

> We're running into a recurrent deadlock issue in both our production and
> staging clusters, both using the latest 0.10.1 release.  The symptom we
> noticed was that, in servers in which kafka producer connections are short
> lived, every other day or so, we'd see file descriptors being exhausted,
> until the broker is restarted, or the broker runs out of file descriptors,
> and it goes down.  None of the clients are on 0.10.1 kafka jars, they're
> all using previous versions.
>
> When diagnosing the issue, we found that when the system is in that state,
> using up file descriptors at a really fast rate, the JVM is actually in a
> deadlock.  Did a thread dump from both jstack and visualvm, and attached
> those to this email.
>
> This is the interesting bit from the jstack thread dump:
>
>
> Found one Java-level deadlock:
> =============================
> "executor-Heartbeat":
>   waiting to lock monitor 0x00000000016c8138 (object 0x000000062732a398, a
> kafka.coordinator.GroupMetadata),
>   which is held by "group-metadata-manager-0"
>
> "group-metadata-manager-0":
>   waiting to lock monitor 0x00000000011ddaa8 (object 0x000000063f1b0cc0, a
> java.util.LinkedList),
>   which is held by "kafka-request-handler-3"
>
> "kafka-request-handler-3":
>   waiting to lock monitor 0x00000000016c8138 (object 0x000000062732a398, a
> kafka.coordinator.GroupMetadata),
>   which is held by "group-metadata-manager-0"
>
>
> I also noticed the background heartbeat thread (I'm guessing the one
> called "executor-Heartbeat" above) is new for this release, under
> KAFKA-3888 ticket - https://issues.apache.org/jira/browse/KAFKA-3888
>
> We haven't noticed this problem with earlier Kafka broker versions, so I'm
> guessing maybe this new background heartbeat thread is what introduced the
> deadlock problem.
>
> That same broker is still in the deadlock scenario, we haven't restarted
> it, so let me know if you'd like more info/log/stats from the system before
> we restart it.
>
> Thanks,
>
> Marcos Juarez
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message