kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apurva Mehta <apu...@confluent.io>
Subject Re: Connectivity problem with controller breaks cluster
Date Tue, 27 Dec 2016 18:34:46 GMT
Looks like you are hitting: https://issues.apache.org/jira/browse/KAFKA-4477

You can try upgrading to 0.10.1.1 and see if the issue recurs (a bunch of
deadlock bugs were fixed which might explain this issue). Or you can try to
provide the data described in
https://issues.apache.org/jira/browse/KAFKA-4477?focusedCommentId=15749722&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15749722
so that we can diagnose the problem.

As it stands, this seems to be a bug introduced in 0.10.1.0. We don't have
enough information to identify the root cause. If you can provide the trace
logging requested on that ticket, it would help.

Thanks,
Apurva

On Tue, Dec 27, 2016 at 9:17 AM, Felipe Santos <felipegs@gmail.com> wrote:

> Hi,
>
> We are using kafka 0.10.1.0.
>
> We have three brokers and three zookeeper.
>
> Today broker 1 and 2 lost connectivity with broker 3, and I saw the broker
> 3 was the controller.
> I saw lot of messages
> "[rw_campaign_broadcast_nextel_734fae3d46d4da63ee36d2b6fd25a77f3f7c3ef5,9]
> on broker 3: Shrinking ISR for partition
> [rw_campaign_broadcast_nextel_734fae3d46d4da63ee36d2b6fd25a77f3f7c3ef5,9]
> from 1,2,3 to 3"
>
> On the broker 2 and 1:
>
> [2016-12-27 08:10:05,501] WARN [ReplicaFetcherThread-0-3], Error in fetch
> kafka.server.ReplicaFetcherThread$FetchRequest@108fd1b0
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 3 was disconnected before the response
> was read
>         at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
>         at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
>         at scala.Option.foreach(Option.scala:257)
>         at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:112)
>         at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:108)
>         at
> kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(
> NetworkClientBlockingOps.scala:137)
>         at
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$
> NetworkClientBlockingOps$$pollContinuously$extension(
> NetworkClientBlockingOps.scala:143)
>         at
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(
> NetworkClientBlockingOps.scala:108)
>         at
> kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:
> 253)
>         at
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
>         at
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
>         at
> kafka.server.AbstractFetcherThread.processFetchRequest(
> AbstractFetcherThread.scala:118)
>         at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>
> All my consumers and producers went down.
> I try to consume and produce with kafka-console-producer/consumer.sh and
> it
> fails.
>
> The only solution was restart broker 3, after that it correct the problem.
>
> Any tips?
> --
> Felipe Santos
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message