kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Helleren, Erik" <Erik.Helle...@cmegroup.com>
Subject Re: Kafka rebalancing failed
Date Tue, 08 Sep 2015 17:52:23 GMT
Hi King,
So, I think the issue could be which consumer you are using.  Are you
using the simple consumer or the high level consumer API?  And which
version of kafka are you using?

If you are using the simple consumer API, you can listen to a specific
partition.  But you have to do the failover code yourself.

If you are using the high level consumer API you get automated failover
and partition distribution, but you don’t get to configure which
partitions you start listening to.  You just tell the kafka client library
that this instance is a member of a consumer group and it makes sure that
all members in that consumer group are consuming all the messages at least
once.  
-Erik

On 9/8/15, 7:55 AM, "King Lee" <aluenkinglee@gmail.com> wrote:

>Hi all ,
>
>
>I'm a user of kafka(version is kafka_2.10-0.8.2.0), but recently I met a
>problem annoying me for a time.
>
>
>
>I create a topic named A for example. this topic has 18 partitions ,and
>run
>9 webservices on 9 servers to consume this topic,each service consume 2
>partitions configured in file .
>
>
>
>It run well at first but one day  I found the service consume speed slowed
>down!
>
>
>
>use this command * kafka-run-class.sh kafka.tools.ConsumerOffsetChecke*r
>I found this topic A was only consumed by 4 service!  The rebalancing not
>work! From the log , it say,
>
>
>
>2015-08-18 16:08:46.156 WARN
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.RangeAssignor:83 - *No broker partitions consumed by
>consumer thread ops180021036.sh-1439282265455-8455a15b-1 for topic A*
>2015-08-18 16:08:46.156 WARN
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.RangeAssignor:83 -* No broker partitions consumed by
>consumer thread ops180021036.sh-1439282265455-8455a15b-0 for topic A*
>2015-08-18 16:08:46.156 INFO
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.ZookeeperConsumerConnector:68 -
>[ops180021036.sh-1439282265455-8455a15b], *Consumer
>ops180021036.sh-1439282265455-8455a15b selected partitions : *
>2015-08-18 16:08:46.156 INFO
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.ZookeeperConsumerConnector:68 -
>[ops180021036.sh-1439282265455-8455a15b],* end rebalancing consumer
>ops180021036.sh-1439282265455-8455a15b try #0*
>2015-08-18 16:08:46.157 INFO
> [ops180021036.sh-1439282265455-8455a15b-leader-finder-thread]
> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread:68 -
>[ops180021036.sh-1439282265455-8455a15b-leader-finder-thread], Starting
>2015-08-18 16:08:47.038 INFO
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.ZookeeperConsumerConnector:68 -
>[ops180021036.sh-1439282265455-8455a15b], begin rebalancing consumer
>ops180021036.sh-1439282265455-8455a15b try #0
>2015-08-18 16:08:47.047 INFO
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.ConsumerFetcherManager:68 -
>[ConsumerFetcherManager-1439282265512] Stopping leader finder thread
>2015-08-18 16:08:47.047 INFO
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread:68 -
>[ops180021036.sh-1439282265455-8455a15b-leader-finder-thread], Shutting
>down
>2015-08-18 16:08:47.047 INFO
> [ops180021036.sh-1439282265455-8455a15b-leader-finder-thread]
> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread:68 -
>[ops180021036.sh-1439282265455-8455a15b-leader-finder-thread], Stopped
>2015-08-18 16:08:47.048 INFO
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread:68 -
>[ops180021036.sh-1439282265455-8455a15b-leader-finder-thread], Shutdown
>completed
>2015-08-18 16:08:47.048 INFO
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.ConsumerFetcherManager:68 -
>[ConsumerFetcherManager-1439282265512] Stopping all fetchers
>2015-08-18 16:08:47.048 INFO
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.ConsumerFetcherManager:68 -
>[ConsumerFetcherManager-1439282265512] All connections stopped
>2015-08-18 16:08:47.048 INFO
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.ZookeeperConsumerConnector:68 -
>[ops180021036.sh-1439282265455-8455a15b], Cleared all relevant queues for
>this fetcher
>2015-08-18 16:08:47.048 INFO
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.ZookeeperConsumerConnector:68 -
>[ops180021036.sh-1439282265455-8455a15b], Cleared the data chunks in all
>the consumer message iterators
>2015-08-18 16:08:47.048 INFO
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.ZookeeperConsumerConnector:68 -
>[ops180021036.sh-1439282265455-8455a15b], Committing all offsets after
>clearing the fetcher queues
>2015-08-18 16:08:47.048 INFO
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.ZookeeperConsumerConnector:68 -
>[ops180021036.sh-1439282265455-8455a15b], Releasing partition ownership
>2015-08-18 16:08:47.178 INFO
> [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
> kafka.consumer.RangeAssignor:68 - Consumer
>ops180021036.sh-1439282265455-8455a15b *rebalancing the following
>partitions*: ArrayBuffer(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
>15, 16, 17) for topic A *with consumers:
>List*(ops178090103.sh-1439282239893-c5966e01-0,
>ops178090103.sh-1439282239893-c5966e01-1,
>ops178090103.sh-1439282260982-7ef4f1c5-0,
>ops178090103.sh-1439282260982-7ef4f1c5-1,
>ops178090103.sh-1439282272017-8d0eae20-0,
>ops178090103.sh-1439282272017-8d0eae20-1,
>ops178093118.sh-1439282241888-810389d0-0,
>ops178093118.sh-1439282241888-810389d0-1,
>ops178093118.sh-1439282272949-812592a0-0,
>ops178093118.sh-1439282272949-812592a0-1,
>ops178096091.sh-1439282247256-dd4f01c3-0,
>ops178096091.sh-1439282247256-dd4f01c3-1,
>ops178096091.sh-1439282261099-1ca552ad-0,
>ops178096091.sh-1439282261099-1ca552ad-1,
>ops178096091.sh-1439282272076-3865c416-0,
>ops178096091.sh-1439282272076-3865c416-1,
>ops178096218.sh-1439282244077-e44933cb-0,
>ops178096218.sh-1439282244077-e44933cb-1,
>ops178096218.sh-1439282250962-6d91ea06-0,
>ops178096218.sh-1439282250962-6d91ea06-1,
>ops178096218.sh-1439282255978-44fae577-0,
>ops178096218.sh-1439282255978-44fae577-1,
>ops178103086.sh-1439282238431-38473eed-0,
>ops178103086.sh-1439282238431-38473eed-1,
>ops178103086.sh-1439282245101-dd2d2e8a-0,
>ops178103086.sh-1439282245101-dd2d2e8a-1,
>ops178103086.sh-1439282250200-9ec3e4f9-0,
>ops178103086.sh-1439282250200-9ec3e4f9-1,
>ops180019230.sh-1439282246060-6c17dbe0-0,
>ops180019230.sh-1439282246060-6c17dbe0-1,
>ops180019230.sh-1439282251861-46b2e7d0-0,
>ops180019230.sh-1439282251861-46b2e7d0-1,
>ops180019230.sh-1439282256080-8c4f4d28-0,
>ops180019230.sh-1439282256080-8c4f4d28-1,
>ops180021036.sh-1439282250060-57c09362-0,
>ops180021036.sh-1439282250060-57c09362-1,
>ops180021036.sh-1439282255415-b301daa2-0,
>ops180021036.sh-1439282255415-b301daa2-1,
>ops180021036.sh-1439282265455-8455a15b-0,
>ops180021036.sh-1439282265455-8455a15b-1,
>ops180021223.sh-1439282248773-578c62d0-0,
>ops180021223.sh-1439282248773-578c62d0-1,
>ops180021223.sh-1439282254389-a5d71a5a-0,
>ops180021223.sh-1439282254389-a5d71a5a-1,
>ops180021223.sh-1439282258421-16b051fb-0,
>ops180021223.sh-1439282258421-16b051fb-1,
>ops180022028.sh-1439282252296-d3b32c71-0,
>ops180022028.sh-1439282252296-d3b32c71-1,
>ops180022028.sh-1439282258091-38be130a-0,
>ops180022028.sh-1439282258091-38be130a-1,
>ops180022028.sh-1439282262207-e6a740b4-0,
>ops180022028.sh-1439282262207-e6a740b4-1)
>
>kafka.consumer.ZookeeperConsumerConnector:76 -
>[ops180021036.sh-1437380766435-2bfff03d],* exception during rebalance *
>
>        at
>kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consu
>mer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperCo
>nsumerConnector.scala:659)
>
>        at
>kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$sy
>ncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerCon
>nector.scala:608)
>
>        at
>kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$sy
>ncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:602)
>
>        at
>kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$sy
>ncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599)
>
>        at
>kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$sy
>ncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599)
>
>        at
>kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebal
>ance(ZookeeperConsumerConnector.scala:598)
>
>        at
>kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run
>(ZookeeperConsumerConnector.scala:551)
>
>
>
>I formatted the consumer list as follows, *ops178090103 for example is
>represented as 10.178.90.103*
>
>
>
>ops178090103.sh-1439282239893-c5966e01-0,
>ops178090103.sh-1439282239893-c5966e01-1,
>ops178090103.sh-1439282260982-7ef4f1c5-0,
>ops178090103.sh-1439282260982-7ef4f1c5-1,
>ops178090103.sh-1439282272017-8d0eae20-0,
>ops178090103.sh-1439282272017-8d0eae20-1,
>
>ops178093118.sh-1439282241888-810389d0-0,
>ops178093118.sh-1439282241888-810389d0-1,
>ops178093118.sh-1439282272949-812592a0-0,
>ops178093118.sh-1439282272949-812592a0-1,
>
>ops178096091.sh-1439282247256-dd4f01c3-0,
>ops178096091.sh-1439282247256-dd4f01c3-1,
>ops178096091.sh-1439282261099-1ca552ad-0,
>ops178096091.sh-1439282261099-1ca552ad-1,
>ops178096091.sh-1439282272076-3865c416-0,
>ops178096091.sh-1439282272076-3865c416-1,
>ops178096218.sh-1439282244077-e44933cb-0,
>ops178096218.sh-1439282244077-e44933cb-1,
>ops178096218.sh-1439282250962-6d91ea06-0,
>ops178096218.sh-1439282250962-6d91ea06-1,
>ops178096218.sh-1439282255978-44fae577-0,
>ops178096218.sh-1439282255978-44fae577-1,
>
>ops178103086.sh-1439282238431-38473eed-0,
>ops178103086.sh-1439282238431-38473eed-1,
>ops178103086.sh-1439282245101-dd2d2e8a-0,
>ops178103086.sh-1439282245101-dd2d2e8a-1,
>ops178103086.sh-1439282250200-9ec3e4f9-0,
>ops178103086.sh-1439282250200-9ec3e4f9-1,
>ops180019230.sh-1439282246060-6c17dbe0-0,
>ops180019230.sh-1439282246060-6c17dbe0-1,
>ops180019230.sh-1439282251861-46b2e7d0-0,
>ops180019230.sh-1439282251861-46b2e7d0-1,
>ops180019230.sh-1439282256080-8c4f4d28-0,
>ops180019230.sh-1439282256080-8c4f4d28-1,
>
>ops180021036.sh-1439282250060-57c09362-0,
>ops180021036.sh-1439282250060-57c09362-1,
>ops180021036.sh-1439282255415-b301daa2-0,
>ops180021036.sh-1439282255415-b301daa2-1,
>ops180021036.sh-1439282265455-8455a15b-0,
>ops180021036.sh-1439282265455-8455a15b-1,
>
>
>
>ops180021223.sh-1439282248773-578c62d0-0,
>ops180021223.sh-1439282248773-578c62d0-1,
>ops180021223.sh-1439282254389-a5d71a5a-0,
>ops180021223.sh-1439282254389-a5d71a5a-1,
>ops180021223.sh-1439282258421-16b051fb-0,
>ops180021223.sh-1439282258421-16b051fb-1,
>
>
>ops180022028.sh-1439282252296-d3b32c71-0,
>ops180022028.sh-1439282252296-d3b32c71-1,
>ops180022028.sh-1439282258091-38be130a-0,
>ops180022028.sh-1439282258091-38be130a-1,
>ops180022028.sh-1439282262207-e6a740b4-0,
>ops180022028.sh-1439282262207-e6a740b4-1
>
>
>
>at last,* only the ops178** can consume can consume this topic,.. all host
>can ping each other successfully.
>
>
>
>SO, I know when a consumer died or add a new consumer will lead the
>rebalancing. but what factors will affect this rebalancing , and what
>factors will cause the failure of rebalancing ?
>
>
>
>Thank!
>
>
>
>Best regards,
>
>aluen


Mime
View raw message