From users-return-38968-apmail-kafka-users-archive=kafka.apache.org@kafka.apache.org Fri Dec 6 15:42:17 2019 Return-Path: X-Original-To: apmail-kafka-users-archive@www.apache.org Delivered-To: apmail-kafka-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by minotaur.apache.org (Postfix) with SMTP id 7E6B2197B9 for ; Fri, 6 Dec 2019 15:42:17 +0000 (UTC) Received: (qmail 8123 invoked by uid 500); 6 Dec 2019 15:42:09 -0000 Delivered-To: apmail-kafka-users-archive@kafka.apache.org Received: (qmail 8079 invoked by uid 500); 6 Dec 2019 15:42:09 -0000 Mailing-List: contact users-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@kafka.apache.org Delivered-To: mailing list users@kafka.apache.org Received: (qmail 8067 invoked by uid 99); 6 Dec 2019 15:42:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Dec 2019 15:42:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id F378F1A32EC for ; Fri, 6 Dec 2019 15:42:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.249 X-Spam-Level: X-Spam-Status: No, score=0.249 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id xNDx8FH98NNL for ; Fri, 6 Dec 2019 15:42:06 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.217.54; helo=mail-vs1-f54.google.com; envelope-from=reluctanthero104@gmail.com; receiver= Received: from mail-vs1-f54.google.com (mail-vs1-f54.google.com [209.85.217.54]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 8912EBC509 for ; Fri, 6 Dec 2019 15:42:06 +0000 (UTC) Received: by mail-vs1-f54.google.com with SMTP id f8so5316446vsq.8 for ; Fri, 06 Dec 2019 07:42:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=nf4GdsvxVCdMtU5VTM63Va6+zYSeku8FUiyV4bzdZNg=; b=FCVknQ5OxFwA5d8kYBKnoVUZXWJusIKklmhIQKTEFwjx/wZgJ9440+aF1Szai04EFP Dht70HoZDuofN/IyR5NoCYBr+LFtlq1yH76KCvsQkEXi6iwP4wqx11d41jPpCguLBAUp yzc39G/QP+1PdLIpLKFClfC47J8f33I2YXr1YhDoi3WR3brj6nhTUwLg+qhWdYYRrvHM o9F/bJsXtVV68fpeKNAcXOhGBYPOgv5g+Uj4ooX8FIcKS9sPlKQsZ2IPIuf73r4j7va4 gKlAbJY8W6DaSeqLWyynZ2SbjGzyu7NofBpsHwjKzXmKMG3C3TeYCn3GnS/dsHhTcX4n uMdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=nf4GdsvxVCdMtU5VTM63Va6+zYSeku8FUiyV4bzdZNg=; b=K+I8QAdssrwYzvAeC1anJPlOYoS7wj/q3Z7+cD6gurI0W2ZjJmRbp588/UV71XoImX eJGnnoNFeii3+4xG764WK2PTkTfVjS2eahTZJ1pWLL0ye/1YDSRToQUn/GsbrBfpGtd/ r9B8qvwTAe4Knh26YlW0JjPYY4scoxqTGdeRCekkgSUjrw45VZP2kWOXf3RnVCPmQcqa UAKTmWlcuqMH++LShpHMkbkYBgtAVcnFJEsZXqbgJuse0kkzFt+OvISBT7ro3foaMhVm ZNR1Dx/qtITNbpGGpO4w2ynDgQZ8zwjsgxxyF76xQlD2ovAinffPS2gFOsBS4zFVfRHR mLCg== X-Gm-Message-State: APjAAAUYNXSkiTR2QjMu7j9c5HQs6F0lJ+QEoHVxo8DatbkjpmLyEcN0 RXFHPVSO3MSLYfdBko5E1n8F6TCvgHDFwuBuRuGxK7Hb X-Google-Smtp-Source: APXvYqzp+MAqf0B7DVVpsGOwP5rxQEG4VsrR/QiiBj2kE/targf6hwLccR8HbNy8xPhVrJ0lNr63owGqnSvr9nEznfg= X-Received: by 2002:a67:f8d1:: with SMTP id c17mr9652308vsp.62.1575646924785; Fri, 06 Dec 2019 07:42:04 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Boyang Chen Date: Fri, 6 Dec 2019 07:41:54 -0800 Message-ID: Subject: Re: Kafka consumer group keeps moving to PreparingRebalance and stops consuming To: users@kafka.apache.org Content-Type: multipart/alternative; boundary="0000000000003e4a8a05990ae326" --0000000000003e4a8a05990ae326 Content-Type: text/plain; charset="UTF-8" Hey Avshalom, the consumer instance is initiated per stream thread. You will not be creating new consumers so the root cause is definitely member timeout. Have you changed the max.poll.interval by any chance? That config controls how long you tolerate the interval between poll calls to make sure progress is being made. If it's very tight, the consumer could stop sending heartbeats once progress is slow. Best, Boyang On Fri, Dec 6, 2019 at 7:12 AM Avshalom Manevich wrote: > We have a Kafka Streams consumer group that keep moving to > PreparingRebalance state and stop consuming. The pattern is as follows: > > 1. > > Consumer group is running and stable for around 20 minutes > 2. > > New consumers (members) start to appear in the group state without any > clear reason, these new members only originate from a small number of > VMs > (not the same VMs each time), and they keep joining > 3. Group state changes to PreparingRebalance > 4. All consumers stop consuming, showing these logs: "Group coordinator > ... is unavailable or invalid, will attempt rediscovery" > 5. The consumer on VMs that generated extra members show these logs: > > Offset commit failed on partition X at offset Y: The coordinator is not > aware of this member. > > Failed to commit stream task X since it got migrated to another thread > already. Closing it as zombie before triggering a new rebalance. > > Detected task Z that got migrated to another thread. This implies that this > thread missed a rebalance and dropped out of the consumer group. Will try > to rejoin the consumer group. > > > 1. We kill all consumer processes on all VMs, the group moves to Empty > with 0 members, we start the processes and we're back to step 1 > > Kafka version is 1.1.0, streams version is 2.0.0 > > We took thread dumps from the misbehaving consumers, and didn't see more > consumer threads than configured. > > We tried restarting kafka brokers, cleaning zookeeper cache. > > We suspect that the issue has to do with missing heartbeats, but the > default heartbeat is 3 seconds and message handling times are no where near > that. > > Anyone encountered a similar behaviour? > --0000000000003e4a8a05990ae326--