kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tony Liu <jiangtao....@zuora.com>
Subject Re: Timeout publishing message to Kafka cluster.
Date Mon, 19 Dec 2016 01:35:14 GMT
when that error happened, I need to manually restart the kafka node `1002`,
after restart finishing, all of the partition is being healthy again.

i.e
*before start ​:*
3 *1002*
<http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1002>
*(1002,1004,1005)* *(1002)* *true* *true*

*​After start:*
3
*​       ​1002*
<http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1002>
*(1002,1004,1005)*
*(1002​, 1004, 1005​)* *true* *true*
​


On Sun, Dec 18, 2016 at 5:29 PM, Tony Liu <jiangtao.liu@zuora.com> wrote:

> Hi,
>
> Recently, we ran into the `batch expired` error in several days, may be 3
> or 5 days, there is not fixed frequency.
>
> *A,* the error is:
> Exception Class : org.apache.kafka.common.errors.TimeoutException
> Error Message : Batch Expired
>
> *B*: server.log from kafka :
>
> [2016-12-18 20:45:32,371] INFO  Partition [thl_raw,43] on broker 1002:
> Shrinking ISR for partition [thl_raw,43] from 1006,1001,1002 to 1002
> (kafka.cluster.Partition)
> [2016-12-18 20:45:32,376] INFO  Partition [HeartBit,6] on broker 1002:
> Shrinking ISR for partition [HeartBit,6] from 1005,1006,1002 to 1002
> (kafka.cluster.Partition)
> [2016-12-18 20:45:32,378] INFO  Partition [thl_raw,31] on broker 1002:
> Shrinking ISR for partition [thl_raw,31] from 1005,1004,1002 to 1002
> (kafka.cluster.Partition)
> [2016-12-18 20:45:32,382] INFO  Partition [HeartBit,0] on broker 1002:
> Shrinking ISR for partition [HeartBit,0] from 1004,1005,1002 to 1002
> (kafka.cluster.Partition)
> [2016-12-18 20:45:32,384] INFO  Partition [ConnectorSync,7] on broker
> 1002: Shrinking ISR for partition [ConnectorSync,7] from 1001,1002,1003 to
> 1002 (kafka.cluster.Partition)
> [2016-12-18 20:45:32,386] INFO  Partition [__consumer_offsets,8] on broker
> 1002: Shrinking ISR for partition [__consumer_offsets,8] from
> 1005,1004,1002 to 1002 (kafka.cluster.Partition)
> [2016-12-18 20:45:32,389] INFO  Partition [thl_raw,37] on broker 1002:
> Shrinking ISR for partition [thl_raw,37] from 1005,1006,1002 to 1002
> (kafka.cluster.Partition)
> [2016-12-18 20:45:32,391] INFO  Partition [HeartBeat,3] on broker 1002:
> Shrinking ISR for partition [HeartBeat,3] from 1005,1004,1002 to 1002
> (kafka.cluster.Partition)
> [2016-12-18 21:17:59,888] INFO  Rolled new log segment for
> '__consumer_offsets-46' in 1 ms. (kafka.log.Log)
> [2016-12-18 21:19:07,923] INFO  Deleting segment 0 from log
> __consumer_offsets-46. (kafka.log.Log)
> [2016-12-18 21:19:07,923] INFO  Deleting segment 101935860 from log
> __consumer_offsets-46. (kafka.log.Log)
> [2016-12-18 21:19:07,924] INFO  Deleting index /kafka/data/__consumer_
> offsets-46/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
> [2016-12-18 21:19:07,924] INFO  Deleting index /kafka/data/__consumer_
> offsets-46/00000000000101935860.index.deleted (kafka.log.OffsetIndex)
> [2016-12-18 21:19:07,924] INFO  Deleting index /kafka/data/__consumer_
> offsets-46/00000000000000000000.timeindex.deleted (kafka.log.TimeIndex)
> [2016-12-18 21:19:07,924] INFO  Deleting index /kafka/data/__consumer_
> offsets-46/00000000000101935860.timeindex.deleted (kafka.log.TimeIndex)
> [2016-12-18 21:19:08,393] INFO  Deleting segment 102963875 from log
> __consumer_offsets-46. (kafka.log.Log)
> [2016-12-18 21:19:08,410] INFO  Deleting index /kafka/data/__consumer_
> offsets-46/00000000000102963875.index.deleted (kafka.log.OffsetIndex)
> [2016-12-18 21:19:08,410] INFO  Deleting index /kafka/data/__consumer_
> offsets-46/00000000000102963875.timeindex.deleted (kafka.log.TimeIndex)
> [2016-12-18 21:48:53,007] INFO  Rolled new log segment for 'thl_raw-24' in
> 1 ms. (kafka.log.Log)
> [2016-12-18 22:15:09,894] INFO  Rolled new log segment for 'thl_raw-1' in
> 0 ms. (kafka.log.Log)
> [2016-12-18 23:34:28,526] INFO  Rolled new log segment for 'thl_raw-9' in
> 1 ms. (kafka.log.Log)
> [2016-12-18 23:34:28,754] INFO  Rolled new log segment for 'thl_raw-39' in
> 0 ms. (kafka.log.Log)
> [2016-12-18 23:34:28,786] INFO  Rolled new log segment for 'thl_raw-7' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:32,816] INFO  Rolled new log segment for 'thl_raw-15' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:33,049] INFO  Rolled new log segment for 'thl_raw-44' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:33,137] INFO  Rolled new log segment for 'thl_raw-20' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:33,305] INFO  Rolled new log segment for 'thl_raw-40' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:33,380] INFO  Rolled new log segment for 'thl_raw-59' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:33,470] INFO  Rolled new log segment for 'thl_raw-50' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:33,630] INFO  Rolled new log segment for 'thl_raw-35' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:33,995] INFO  Rolled new log segment for 'thl_raw-45' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:34,007] INFO  Rolled new log segment for 'thl_raw-34' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:34,265] INFO  Rolled new log segment for 'thl_raw-48' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:34,359] INFO  Rolled new log segment for 'thl_raw-54' in
> 1 ms. (kafka.log.Log)
> [2016-12-19 00:04:34,367] INFO  Rolled new log segment for 'thl_raw-10' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:34,540] INFO  Rolled new log segment for 'thl_raw-2' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:35,123] INFO  Rolled new log segment for 'thl_raw-14' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:36,822] INFO  Rolled new log segment for 'thl_raw-29' in
> 0 ms. (kafka.log.Log)
> [2016-12-19 00:04:36,970] INFO  Rolled new log segment for 'thl_raw-18' in
> 0 ms. (kafka.log.Log)
>
> *C*, when that kind of error happened, we always see the replication
> being in problem, like:
>
> Topics
> Topic# Partitions# BrokersBrokers Spread %Brokers Skew %# ReplicasUnder
> Replicated %Producer Message/Sec
> __consumer_offsets
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/__consumer_offsets>
> 50 6 100 0 3 16 0.00
> ConnectorSync
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/ConnectorSync>
> 8 6 100 16 3 25 0.00
> EventInstance
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/EventInstance>
> 8 6 100 16 3 12 0.00
> fjord_healthy_checker
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/fjord_healthy_checker>
> 8 6 100 16 3 12 0.00
> HeartBeat
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/HeartBeat>
> 8 6 100 16 3 12 0.00
> HeartBit
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/HeartBit>
> 8 6 100 0 3 25 0.00
> Notification
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/Notification>
> 8 6 100 33 3 12 0.00
> NotificationEventInstance
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/NotificationEventInstance>
> 8 6 100 16 3 12 0.00
> thl_raw
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/topics/thl_raw>
> 64 6 100 0 3 17 0.00
> *D*, All of the replication sounds related with node '1002` (click into
> the each of topic, all of the issued partitions having the similar like `*blue
> highlight*` )
> Partition Information
> PartitionLatest OffsetLeaderReplicasIn Sync ReplicasPreferred Leader?Under
> Replicated?
> 0 1005
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1005>
> (1005,1001,1002) (1005,1002,1001) true false
> 1 1006
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1006>
> (1006,1002,1003) (1006,1003,1002) true false
> 2 1001
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1001>
> (1001,1003,1004) (1004,1003,1001) true false
> 3 *1002*
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1002>
> *(1002,1004,1005)* *(1002)* *true* *true*
> 4 1003
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1003>
> (1003,1005,1006) (1003,1006,1005) true false
> 5 1004
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1004>
> (1004,1006,1001) (1004,1001,1006) true false
> 6 1005
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1005>
> (1005,1002,1003) (1003,1005,1002) true false
> 7 1006
> <http://fjord-staging2-kafka-manager.infradev.zuora.com:9000/clusters/fjord-staging2/brokers/1006>
> (1006,1003,1004) (1003,1006,1004) true false
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message