kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From King Lee <aluenking...@gmail.com>
Subject Kafka rebalancing failed
Date Tue, 08 Sep 2015 12:55:47 GMT
Hi all ,


I'm a user of kafka(version is kafka_2.10-0.8.2.0), but recently I met a
problem annoying me for a time.



I create a topic named A for example. this topic has 18 partitions ,and run
9 webservices on 9 servers to consume this topic,each service consume 2
partitions configured in file .



It run well at first but one day  I found the service consume speed slowed
down!



use this command * kafka-run-class.sh kafka.tools.ConsumerOffsetChecke*r
I found this topic A was only consumed by 4 service!  The rebalancing not
work! From the log , it say,



2015-08-18 16:08:46.156 WARN
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.RangeAssignor:83 - *No broker partitions consumed by
consumer thread ops180021036.sh-1439282265455-8455a15b-1 for topic A*
2015-08-18 16:08:46.156 WARN
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.RangeAssignor:83 -* No broker partitions consumed by
consumer thread ops180021036.sh-1439282265455-8455a15b-0 for topic A*
2015-08-18 16:08:46.156 INFO
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.ZookeeperConsumerConnector:68 -
[ops180021036.sh-1439282265455-8455a15b], *Consumer
ops180021036.sh-1439282265455-8455a15b selected partitions : *
2015-08-18 16:08:46.156 INFO
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.ZookeeperConsumerConnector:68 -
[ops180021036.sh-1439282265455-8455a15b],* end rebalancing consumer
ops180021036.sh-1439282265455-8455a15b try #0*
2015-08-18 16:08:46.157 INFO
 [ops180021036.sh-1439282265455-8455a15b-leader-finder-thread]
 kafka.consumer.ConsumerFetcherManager$LeaderFinderThread:68 -
[ops180021036.sh-1439282265455-8455a15b-leader-finder-thread], Starting
2015-08-18 16:08:47.038 INFO
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.ZookeeperConsumerConnector:68 -
[ops180021036.sh-1439282265455-8455a15b], begin rebalancing consumer
ops180021036.sh-1439282265455-8455a15b try #0
2015-08-18 16:08:47.047 INFO
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.ConsumerFetcherManager:68 -
[ConsumerFetcherManager-1439282265512] Stopping leader finder thread
2015-08-18 16:08:47.047 INFO
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.ConsumerFetcherManager$LeaderFinderThread:68 -
[ops180021036.sh-1439282265455-8455a15b-leader-finder-thread], Shutting down
2015-08-18 16:08:47.047 INFO
 [ops180021036.sh-1439282265455-8455a15b-leader-finder-thread]
 kafka.consumer.ConsumerFetcherManager$LeaderFinderThread:68 -
[ops180021036.sh-1439282265455-8455a15b-leader-finder-thread], Stopped
2015-08-18 16:08:47.048 INFO
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.ConsumerFetcherManager$LeaderFinderThread:68 -
[ops180021036.sh-1439282265455-8455a15b-leader-finder-thread], Shutdown
completed
2015-08-18 16:08:47.048 INFO
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.ConsumerFetcherManager:68 -
[ConsumerFetcherManager-1439282265512] Stopping all fetchers
2015-08-18 16:08:47.048 INFO
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.ConsumerFetcherManager:68 -
[ConsumerFetcherManager-1439282265512] All connections stopped
2015-08-18 16:08:47.048 INFO
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.ZookeeperConsumerConnector:68 -
[ops180021036.sh-1439282265455-8455a15b], Cleared all relevant queues for
this fetcher
2015-08-18 16:08:47.048 INFO
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.ZookeeperConsumerConnector:68 -
[ops180021036.sh-1439282265455-8455a15b], Cleared the data chunks in all
the consumer message iterators
2015-08-18 16:08:47.048 INFO
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.ZookeeperConsumerConnector:68 -
[ops180021036.sh-1439282265455-8455a15b], Committing all offsets after
clearing the fetcher queues
2015-08-18 16:08:47.048 INFO
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.ZookeeperConsumerConnector:68 -
[ops180021036.sh-1439282265455-8455a15b], Releasing partition ownership
2015-08-18 16:08:47.178 INFO
 [ops180021036.sh-1439282265455-8455a15b_watcher_executor]
 kafka.consumer.RangeAssignor:68 - Consumer
ops180021036.sh-1439282265455-8455a15b *rebalancing the following
partitions*: ArrayBuffer(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17) for topic A *with consumers:
List*(ops178090103.sh-1439282239893-c5966e01-0,
ops178090103.sh-1439282239893-c5966e01-1,
ops178090103.sh-1439282260982-7ef4f1c5-0,
ops178090103.sh-1439282260982-7ef4f1c5-1,
ops178090103.sh-1439282272017-8d0eae20-0,
ops178090103.sh-1439282272017-8d0eae20-1,
ops178093118.sh-1439282241888-810389d0-0,
ops178093118.sh-1439282241888-810389d0-1,
ops178093118.sh-1439282272949-812592a0-0,
ops178093118.sh-1439282272949-812592a0-1,
ops178096091.sh-1439282247256-dd4f01c3-0,
ops178096091.sh-1439282247256-dd4f01c3-1,
ops178096091.sh-1439282261099-1ca552ad-0,
ops178096091.sh-1439282261099-1ca552ad-1,
ops178096091.sh-1439282272076-3865c416-0,
ops178096091.sh-1439282272076-3865c416-1,
ops178096218.sh-1439282244077-e44933cb-0,
ops178096218.sh-1439282244077-e44933cb-1,
ops178096218.sh-1439282250962-6d91ea06-0,
ops178096218.sh-1439282250962-6d91ea06-1,
ops178096218.sh-1439282255978-44fae577-0,
ops178096218.sh-1439282255978-44fae577-1,
ops178103086.sh-1439282238431-38473eed-0,
ops178103086.sh-1439282238431-38473eed-1,
ops178103086.sh-1439282245101-dd2d2e8a-0,
ops178103086.sh-1439282245101-dd2d2e8a-1,
ops178103086.sh-1439282250200-9ec3e4f9-0,
ops178103086.sh-1439282250200-9ec3e4f9-1,
ops180019230.sh-1439282246060-6c17dbe0-0,
ops180019230.sh-1439282246060-6c17dbe0-1,
ops180019230.sh-1439282251861-46b2e7d0-0,
ops180019230.sh-1439282251861-46b2e7d0-1,
ops180019230.sh-1439282256080-8c4f4d28-0,
ops180019230.sh-1439282256080-8c4f4d28-1,
ops180021036.sh-1439282250060-57c09362-0,
ops180021036.sh-1439282250060-57c09362-1,
ops180021036.sh-1439282255415-b301daa2-0,
ops180021036.sh-1439282255415-b301daa2-1,
ops180021036.sh-1439282265455-8455a15b-0,
ops180021036.sh-1439282265455-8455a15b-1,
ops180021223.sh-1439282248773-578c62d0-0,
ops180021223.sh-1439282248773-578c62d0-1,
ops180021223.sh-1439282254389-a5d71a5a-0,
ops180021223.sh-1439282254389-a5d71a5a-1,
ops180021223.sh-1439282258421-16b051fb-0,
ops180021223.sh-1439282258421-16b051fb-1,
ops180022028.sh-1439282252296-d3b32c71-0,
ops180022028.sh-1439282252296-d3b32c71-1,
ops180022028.sh-1439282258091-38be130a-0,
ops180022028.sh-1439282258091-38be130a-1,
ops180022028.sh-1439282262207-e6a740b4-0,
ops180022028.sh-1439282262207-e6a740b4-1)

kafka.consumer.ZookeeperConsumerConnector:76 -
[ops180021036.sh-1437380766435-2bfff03d],* exception during rebalance *

        at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:659)

        at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:608)

        at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:602)

        at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599)

        at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:599)

        at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:598)

        at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:551)



I formatted the consumer list as follows, *ops178090103 for example is
represented as 10.178.90.103*



ops178090103.sh-1439282239893-c5966e01-0,
ops178090103.sh-1439282239893-c5966e01-1,
ops178090103.sh-1439282260982-7ef4f1c5-0,
ops178090103.sh-1439282260982-7ef4f1c5-1,
ops178090103.sh-1439282272017-8d0eae20-0,
ops178090103.sh-1439282272017-8d0eae20-1,

ops178093118.sh-1439282241888-810389d0-0,
ops178093118.sh-1439282241888-810389d0-1,
ops178093118.sh-1439282272949-812592a0-0,
ops178093118.sh-1439282272949-812592a0-1,

ops178096091.sh-1439282247256-dd4f01c3-0,
ops178096091.sh-1439282247256-dd4f01c3-1,
ops178096091.sh-1439282261099-1ca552ad-0,
ops178096091.sh-1439282261099-1ca552ad-1,
ops178096091.sh-1439282272076-3865c416-0,
ops178096091.sh-1439282272076-3865c416-1,
ops178096218.sh-1439282244077-e44933cb-0,
ops178096218.sh-1439282244077-e44933cb-1,
ops178096218.sh-1439282250962-6d91ea06-0,
ops178096218.sh-1439282250962-6d91ea06-1,
ops178096218.sh-1439282255978-44fae577-0,
ops178096218.sh-1439282255978-44fae577-1,

ops178103086.sh-1439282238431-38473eed-0,
ops178103086.sh-1439282238431-38473eed-1,
ops178103086.sh-1439282245101-dd2d2e8a-0,
ops178103086.sh-1439282245101-dd2d2e8a-1,
ops178103086.sh-1439282250200-9ec3e4f9-0,
ops178103086.sh-1439282250200-9ec3e4f9-1,
ops180019230.sh-1439282246060-6c17dbe0-0,
ops180019230.sh-1439282246060-6c17dbe0-1,
ops180019230.sh-1439282251861-46b2e7d0-0,
ops180019230.sh-1439282251861-46b2e7d0-1,
ops180019230.sh-1439282256080-8c4f4d28-0,
ops180019230.sh-1439282256080-8c4f4d28-1,

ops180021036.sh-1439282250060-57c09362-0,
ops180021036.sh-1439282250060-57c09362-1,
ops180021036.sh-1439282255415-b301daa2-0,
ops180021036.sh-1439282255415-b301daa2-1,
ops180021036.sh-1439282265455-8455a15b-0,
ops180021036.sh-1439282265455-8455a15b-1,



ops180021223.sh-1439282248773-578c62d0-0,
ops180021223.sh-1439282248773-578c62d0-1,
ops180021223.sh-1439282254389-a5d71a5a-0,
ops180021223.sh-1439282254389-a5d71a5a-1,
ops180021223.sh-1439282258421-16b051fb-0,
ops180021223.sh-1439282258421-16b051fb-1,


ops180022028.sh-1439282252296-d3b32c71-0,
ops180022028.sh-1439282252296-d3b32c71-1,
ops180022028.sh-1439282258091-38be130a-0,
ops180022028.sh-1439282258091-38be130a-1,
ops180022028.sh-1439282262207-e6a740b4-0,
ops180022028.sh-1439282262207-e6a740b4-1



at last,* only the ops178** can consume can consume this topic,.. all host
can ping each other successfully.



SO, I know when a consumer died or add a new consumer will lead the
rebalancing. but what factors will affect this rebalancing , and what
factors will cause the failure of rebalancing ?



Thankļ¼



Best regards,

aluen

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message