kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lawrence Weikum <lwei...@pandora.com>
Subject Re: Deadlock using latest 0.10.1 Kafka release
Date Thu, 03 Nov 2016 19:53:57 GMT
We saw this increase when upgrading from 0.9.0.1 to 0.10.0.1.  
We’re now running on 0.10.1.0, and the FD increase is due to a deadlock, not functionality
or new features.

Lawrence Weikum | Software Engineer | Pandora
1426 Pearl Street, Suite 100, Boulder CO 80302
m 720.203.1578 | lweikum@pandora.com

On 11/3/16, 12:42 PM, "Hans Jespersen" <hans@confluent.io> wrote:

    The 0.10.1 broker will use more file descriptor than previous releases
    because of the new timestamp indexes. You should expect and plan for ~33%
    more file descriptors to be open.
    
    -hans
    
    /**
     * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
     * hans@confluent.io (650)924-2670
     */
    
    On Thu, Nov 3, 2016 at 10:02 AM, Marcos Juarez <mjuarez@gmail.com> wrote:
    
    > We're running into a recurrent deadlock issue in both our production and
    > staging clusters, both using the latest 0.10.1 release.  The symptom we
    > noticed was that, in servers in which kafka producer connections are short
    > lived, every other day or so, we'd see file descriptors being exhausted,
    > until the broker is restarted, or the broker runs out of file descriptors,
    > and it goes down.  None of the clients are on 0.10.1 kafka jars, they're
    > all using previous versions.
    >
    > When diagnosing the issue, we found that when the system is in that state,
    > using up file descriptors at a really fast rate, the JVM is actually in a
    > deadlock.  Did a thread dump from both jstack and visualvm, and attached
    > those to this email.
    >
    > This is the interesting bit from the jstack thread dump:
    >
    >
    > Found one Java-level deadlock:
    > =============================
    > "executor-Heartbeat":
    >   waiting to lock monitor 0x00000000016c8138 (object 0x000000062732a398, a
    > kafka.coordinator.GroupMetadata),
    >   which is held by "group-metadata-manager-0"
    >
    > "group-metadata-manager-0":
    >   waiting to lock monitor 0x00000000011ddaa8 (object 0x000000063f1b0cc0, a
    > java.util.LinkedList),
    >   which is held by "kafka-request-handler-3"
    >
    > "kafka-request-handler-3":
    >   waiting to lock monitor 0x00000000016c8138 (object 0x000000062732a398, a
    > kafka.coordinator.GroupMetadata),
    >   which is held by "group-metadata-manager-0"
    >
    >
    > I also noticed the background heartbeat thread (I'm guessing the one
    > called "executor-Heartbeat" above) is new for this release, under
    > KAFKA-3888 ticket - https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_KAFKA-2D3888&d=CwIBaQ&c=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw&r=VSog3hHkqzZLadc6n_6BPH1OAPc78b24WpAbuhVZI0E&m=zJ2wVkapVi8N-jmDGRxM8a16nchqtjTfs20lhBw5xB0&s=nEcLEnYWPyaDuPDI5vSSKPWoljoXYbvNriVw0wrEegk&e=

    >
    > We haven't noticed this problem with earlier Kafka broker versions, so I'm
    > guessing maybe this new background heartbeat thread is what introduced the
    > deadlock problem.
    >
    > That same broker is still in the deadlock scenario, we haven't restarted
    > it, so let me know if you'd like more info/log/stats from the system before
    > we restart it.
    >
    > Thanks,
    >
    > Marcos Juarez
    >
    

Mime
View raw message