kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro De Maria <alessandro.dema...@gmail.com>
Subject Re: broker disconnected from cluster
Date Thu, 29 Dec 2016 22:45:48 GMT
Thanks!! Does the upgrade help?


On 29 December 2016 at 21:38, Tony Liu <jiangtao.liu@zuora.com> wrote:

> hi,
>
> you are hitting this issue ,
> https://issues.apache.org/jira/browse/KAFKA-4477
>
> On Wed, Dec 28, 2016 at 3:43 PM, Alessandro De Maria <
> alessandro.demaria@gmail.com> wrote:
>
> > Hello,
> >
> > I would like to get some help/advise on some issues I am having with my
> > kafka cluster.
> >
> > I am running kafka (kafka_2.11-0.10.1.0) on a 5 broker cluster (ubuntu
> > 16.04)
> >
> > configuration is here: http://pastebin.com/cPch8Kd7
> >
> > today one of the 5 brokers (id: 1) appeared to disconnect from the
> others:
> >
> > The log shows this around that time
> > [2016-12-28 16:18:30,575] INFO Partition [aki_reload5yl_5,11] on broker
> 1:
> > Shrinking ISR for partition [aki_reload5yl_5,11] from 2,3,1 to 1
> > (kafka.cluster.Partition)
> > [2016-12-28 16:18:30,579] INFO Partition [ale_reload5yl_1,0] on broker 1:
> > Shrinking ISR for partition [ale_reload5yl_1,0] from 5,1,2 to 1
> > (kafka.cluster.Partition)
> > [2016-12-28 16:18:30,580] INFO Partition [hl7_staging,17] on broker 1:
> > Shrinking ISR for partition [hl7_staging,17] from 4,1,5 to 1
> > (kafka.cluster.Partition)
> > [2016-12-28 16:18:30,581] INFO Partition [hes_reload_5,37] on broker 1:
> > Shrinking ISR for partition [hes_reload_5,37] from 1,2,5 to 1
> > (kafka.cluster.Partition)
> > [2016-12-28 16:18:30,582] INFO Partition [aki_live,38] on broker 1:
> > Shrinking ISR for partition [aki_live,38] from 5,2,1 to 1
> > (kafka.cluster.Partition)
> > [2016-12-28 16:18:30,582] INFO Partition [hl7_live,51] on broker 1:
> > Shrinking ISR for partition [hl7_live,51] from 1,3,4 to 1
> > (kafka.cluster.Partition)
> >
> > (other hosts had)
> > java.io.IOException: Connection to 1 was disconnected before the response
> > was read
> >         at
> > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> > extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
> >         at
> > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> > extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
> >         at scala.Option.foreach(Option.scala:257)
> >         at
> > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> > extension$1.apply(NetworkClientBlockingOps.scala:112)
> >         at
> > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> > extension$1.apply(NetworkClientBlockingOps.scala:108)
> >         at
> > kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(
> > NetworkClientBlockingOps.scala:137)
> >         at
> > kafka.utils.NetworkClientBlockingOps$.kafka$utils$
> > NetworkClientBlockingOps$$pollContinuously$extension(
> > NetworkClientBlockingOps.scala:143)
> >         at
> > kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(
> > NetworkClientBlockingOps.scala:108)
> >         at
> > kafka.server.ReplicaFetcherThread.sendRequest(
> ReplicaFetcherThread.scala:
> > 253)
> >         at
> > kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
> >         at
> > kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> >         at
> > kafka.server.AbstractFetcherThread.processFetchRequest(
> > AbstractFetcherThread.scala:118)
> >         at
> > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:
> 103)
> >         at kafka.utils.ShutdownableThread.run(
> ShutdownableThread.scala:63)
> >
> >
> > while this was happening, the ConsumerOffsetChecker was reporting only
> few
> > of the 128 partitions configured for some of the topics, and consumers
> > started crashing.
> >
> > I then used KafkaManager to reassign partitions from broker 1 to other
> > brokers.
> >
> > I could then see on the kafka1 log the following errors
> > [2016-12-28 17:23:51,816] ERROR [ReplicaFetcherThread-0-4], Error for
> > partition [aki_live,86] to broker
> > 4:org.apache.kafka.common.errors.UnknownServerException: The server
> > experienced an unexpected error when processing the request
> > (kafka.server.ReplicaFetcherThread)
> > [2016-12-28 17:23:51,817] ERROR [ReplicaFetcherThread-0-4], Error for
> > partition [aki_live,21] to broker
> > 4:org.apache.kafka.common.errors.UnknownServerException: The server
> > experienced an unexpected error when processing the request
> > (kafka.server.ReplicaFetcherThread)
> > [2016-12-28 17:23:51,817] ERROR [ReplicaFetcherThread-0-4], Error for
> > partition [aki_live,126] to broker
> > 4:org.apache.kafka.common.errors.UnknownServerException: The server
> > experienced an unexpected error when processing the request
> > (kafka.server.ReplicaFetcherThread)
> > [2016-12-28 17:23:51,818] ERROR [ReplicaFetcherThread-0-4], Error for
> > partition [aki_live,6] to broker
> > 4:org.apache.kafka.common.errors.UnknownServerException: The server
> > experienced an unexpected error when processing the request
> > (kafka.server.ReplicaFetcherThread)
> >
> >
> > I thought I would restart broker1, but as soon as I did, most of my topic
> > ended up with some empty partitions, and their consumer offsets were
> wiped
> > out completely.
> >
> > I understand that because of unclean.leader.election.enable = true an
> > unclean leader would be elected, but why were the partition wiped out if
> > there were at least 3 replicas for each?
> >
> > What do you thin caused the disconnection in the first place, and how
> can I
> > recover from situations like this in the future?
> >
> > Regards
> > Alessandro
> >
> >
> >
> >
> >
> > --
> > Alessandro De Maria
> > alessandro.demaria@gmail.com
> >
>



-- 
Alessandro De Maria
alessandro.demaria@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message