From users-return-2904-apmail-kafka-users-archive=kafka.apache.org@kafka.apache.org Thu Dec 13 19:36:41 2012 Return-Path: X-Original-To: apmail-kafka-users-archive@www.apache.org Delivered-To: apmail-kafka-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 40B26D02D for ; Thu, 13 Dec 2012 19:36:41 +0000 (UTC) Received: (qmail 409 invoked by uid 500); 13 Dec 2012 19:36:41 -0000 Delivered-To: apmail-kafka-users-archive@kafka.apache.org Received: (qmail 387 invoked by uid 500); 13 Dec 2012 19:36:41 -0000 Mailing-List: contact users-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@kafka.apache.org Delivered-To: mailing list users@kafka.apache.org Received: (qmail 379 invoked by uid 99); 13 Dec 2012 19:36:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Dec 2012 19:36:41 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dyross@klout.com designates 74.125.149.85 as permitted sender) Received: from [74.125.149.85] (HELO na3sys009aog136.obsmtp.com) (74.125.149.85) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 13 Dec 2012 19:36:33 +0000 Received: from mail-wg0-f72.google.com ([74.125.82.72]) (using TLSv1) by na3sys009aob136.postini.com ([74.125.148.12]) with SMTP ID DSNKUMouK9cBObZE+sA5yitZtZcm5rcrK+dE@postini.com; Thu, 13 Dec 2012 11:36:12 PST Received: by mail-wg0-f72.google.com with SMTP id fg15so1849241wgb.7 for ; Thu, 13 Dec 2012 11:36:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=o0I+BHoSpEV3QADRfwTxCuVo4FcLOIz4MhaYIpxB308=; b=gz5OEGrgGpKoBRmkP3NUoWWpAjehbhj/0xVa24R669x+UKUbAZgvXuhYRqoqTbkjVe lg15KQW2nRjfcLM2mhjFkHUhwKoDrsbMp2yqkjUHlwygzMd/kwcFHsqaP9LFdc1D3N1B y7ew6Oge1pMsy4jPMf7ThGd1XVerMzvTCsEHrgvzOsBfKbyOeXwpG776ReNZfe5OfM22 VrHmqYjfUUlM4c5A6cM9Vm8rgfPI8EMiNprQlmPJDy1jHSJE3oIxTQoLB935nhKh9CM3 mWdMpDXl2QRi/yoybiq2k82TI9Af8/tdhRgiDAnKs7UkvSWkXgS2J5057JcgFBxqcaWn iWhw== Received: by 10.204.147.156 with SMTP id l28mr1544456bkv.87.1355427370174; Thu, 13 Dec 2012 11:36:10 -0800 (PST) MIME-Version: 1.0 Received: by 10.204.147.156 with SMTP id l28mr1544448bkv.87.1355427369985; Thu, 13 Dec 2012 11:36:09 -0800 (PST) Received: by 10.204.7.220 with HTTP; Thu, 13 Dec 2012 11:36:09 -0800 (PST) In-Reply-To: References: Date: Thu, 13 Dec 2012 11:36:09 -0800 Message-ID: Subject: Re: How fair should I expect rebalancing to be? From: David Ross To: users@kafka.apache.org, Naveen Gattu Content-Type: multipart/alternative; boundary=0015174c10a8e8284f04d0c106a8 X-Gm-Message-State: ALoCoQkSxJoWRttNCd2KkREoDDIBRTn1Wn7HhznjnW7fbLlNcCnm/lAoQ/buRdSupHWWiwwrqOHTptaQ/84OhkDB/e0ogQKoxJXFbaaDVyqlpDCbDVS3sb0ESt2XClHcliO2X9AYRBn9YLpDV/rGFE0Gp8jW4AaKMg== X-Virus-Checked: Checked by ClamAV on apache.org --0015174c10a8e8284f04d0c106a8 Content-Type: text/plain; charset=ISO-8859-1 Looking into the "fail to rebalance" messages. We do have zk 3.3.4. Could a higher number of partitions be the cause? On Thu, Dec 13, 2012 at 8:03 AM, Jun Rao wrote: > Do you see "fail to rebalance after 4 tries" in the worker not getting any > data? If so, what's the ZK version? You should use 3.3.4 or above, which > fixed some bugs that could cause rebalance to fail. > > Thanks, > > Jun > > On Wed, Dec 12, 2012 at 11:17 PM, David Ross wrote: > > > Hello, > > > > I am trying to distribute work across several nodes using Kafka. I have 3 > > brokers each with 16 partitions. I have 8 worker servers listening with > one > > message stream on the same topic. I expect each server to own about 1/8 > of > > the partitions, yet I am not seeing this. It seems initially, the work is > > fairly evenly distributed. However, after running for several hours, I > see > > that only three consumers own any partitions, and only 32 of the 48 have > an > > owner at all. > > > > What gives? > > > > For reference, we have 0.7.0 on the server and 0.7.2 on the consumer. > Also, > > I set the max rebalance retries to be 10 because I saw a lot of rebalance > > failures in the logs. > > > > > > Thanks, > > > > David > > > --0015174c10a8e8284f04d0c106a8--