kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tousif <tousif.pa...@gmail.com>
Subject Re: kafka brokers going down within 24 hrs
Date Fri, 16 Jan 2015 05:51:57 GMT
i'm using kafka 2.9.2-0.8.1.1 and zookeeper 3.4.6.
i noticed that only one broker is going down.
 My message size is less thn 3 kb and  KAFKA_HEAP_OPTS="-Xmx512M"
and  KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseCompressedOops
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled
-XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC
-Djava.awt.headless=true" .

 Do you mean kafka broker never goes down and  does broker start
automatically after failing ?
I see only these errors on both the brokers.

10.0.0.11 is the broker which is going down.

ERROR Closing socket for /10.0.0.11 because of error
(kafka.network.Processor)
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
at kafka.utils.Utils$.read(Utils.scala:375)
at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Processor.read(SocketServer.scala:347)
at kafka.network.Processor.run(SocketServer.scala:245)
at java.lang.Thread.run(Thread.java:662)
[2015-01-16 11:01:48,173] INFO Closing socket connection to /10.0.0.11.
(kafka.network.Processor)
[2015-01-16 11:03:08,164] ERROR Closing socket for /10.0.0.11 because of
error (kafka.network.Processor)
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
at kafka.utils.Utils$.read(Utils.scala:375)
at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Processor.read(SocketServer.scala:347)
at kafka.network.Processor.run(SocketServer.scala:245)
at java.lang.Thread.run(Thread.java:662)
[2015-01-16 11:03:08,280] INFO Closing socket connection to /10.0.0.11.
(kafka.network.Processor)
[2015-01-16 11:03:48,369] ERROR Closing socket for /10.0.0.11 because of
error (kafka.network.Processor)
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
at kafka.utils.Utils$.read(Utils.scala:375)
at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Processor.read(SocketServer.scala:347)
at kafka.network.Processor.run(SocketServer.scala:245)
at java.lang.Thread.run(Thread.java:662)



On Thu, Jan 15, 2015 at 7:49 PM, Harsha <kafka@harsha.io> wrote:

> Tousif,
>        Which version of kafka and zookeeper are you using and whats your
>        message size and jvm size that you allocated for kafka brokers.
> There is only 1 zookeeper node , if its a production cluster I recommend
> you to have quorum of zookeeper nodes. Both kafka & storm are heavy
> users of zookeeper. Also supervisord is recommended for storm I am not
> sure you need to have it for kafka, for storm its the fail-fast nature
> of workers that requires supervisord to restart.
> When kafka goes down first time , i.e before supervisord restarts it do
> you see same OOM error. Check the logs to see why its going down for the
> first time.
> -Harsha
>
>
>
> On Wed, Jan 14, 2015, at 10:50 PM, Tousif wrote:
> > Hello Chia-Chun Shih,
> >
> > There are multiple issues,
> > First thing is i don't see out of memory error every time and OOM happens
> > after supervisord keep retrying to start  kafka.
> > It goes down when it tries to add partition fetcher
> >
> > it starts with
> >
> > *conflict in /controller data:
> > {"version":1,"brokerid":0,"timestamp":"1421296052741"} stored data:
> > {"version":1,"brokerid":1,"timestamp":"1421291998088"}
> > (kafka.utils.ZkUtils$)*
> >
> >
> > ERROR Conditional update of path
> > /brokers/topics/realtimestreaming/partitions/1/state with data
> >
> {"controller_epoch":34,"leader":0,"version":1,"leader_epoch":54,"isr":[0]}
> > and expected version 90 failed due to
> > org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode
> > =
> > BadVersion for /brokers/topics/realtimestreaming/partitions/1/state
> > (kafka.utils.ZkUtils$)
> >
> > and then
> >
> > [ReplicaFetcherManager on broker 0] Removed fetcher for partitions
> > [realtimestreaming,0],[realtimestreaming,1]
> > (kafka.server.ReplicaFetcherManager)
> > [2015-01-15 09:57:34,350] INFO Truncating log realtimestreaming-0 to
> > offset
> > 846. (kafka.log.Log)
> > [2015-01-15 09:57:34,351] INFO Truncating log realtimestreaming-1 to
> > offset
> > 957. (kafka.log.Log)
> > [2015-01-15 09:57:34,650] INFO [ReplicaFetcherManager on broker 0] *Added
> > fetcher for partitions ArrayBuffer([[realtimestreaming,0], initOffset 846
> > to broker id:1,host:realtimeslave1.novalocal,port:9092] ,
> > [[realtimestreaming,1], initOffset 957 to broker
> > id:1,host:realtimeslave1.novalocal,port:9092] )
> > (kafka.server.ReplicaFetcherManager)*
> > [2015-01-15 09:57:34,654] INFO [ReplicaFetcherThread-0-1], Starting
> >  (kafka.server.ReplicaFetcherThread)
> > [2015-01-15 09:57:34,747] INFO [ReplicaFetcherThread-1-1], Starting
> >  (kafka.server.ReplicaFetcherThread)
> > [2015-01-15 09:58:14,156] INFO Closing socket connection to /10.0.0.11.
> > (kafka.network.Processor)
> >
> >
> >
> > On Thu, Jan 15, 2015 at 12:01 PM, Chia-Chun Shih
> > <chiachun.shih@gmail.com>
> > wrote:
> >
> > > You can use tools (e.g., VisialVM) to diagnose OOM problem.
> > >
> > > 2015-01-15 14:15 GMT+08:00 Tousif Khazi <tousif@senseforth.com>:
> > >
> > > > i see this error
> > > >
> > > >  ERROR [ReplicaFetcherThread-0-1], Error for partition
> > > > [realtimestreaming,1] to broker 1:class
> > > > kafka.common.NotLeaderForPartitionException
> > > > (kafka.server.ReplicaFetcherThread)
> > > > [2015-01-15 10:00:04,348] INFO [ReplicaFetcherManager on broker 0]
> > > > Removed fetcher for partitions [realtimestreaming,1]
> > > > (kafka.server.ReplicaFetcherManager)
> > > > [2015-01-15 10:00:04,355] INFO Closing socket connection to
> > > > /10.0.0.11. (kafka.network.Processor)
> > > > [2015-01-15 10:00:04,444] WARN [KafkaApi-0] Fetch request with
> > > > correlation id 0 from client ReplicaFetcherThread-0-0 on partition
> > > > [realtimestreaming,1] failed due to Leader not local for partition
> > > > [realtimestreaming,1] on broker 0 (kafka.server.KafkaApis)
> > > > [2015-01-15 10:00:04,545] INFO [ReplicaFetcherThread-0-1], Shutting
> > > > down (kafka.server.ReplicaFetcherThread)
> > > > [2015-01-15 10:00:04,848] INFO [ReplicaFetcherThread-0-1], Stopped
> > > > (kafka.server.ReplicaFetcherThread)
> > > > [2015-01-15 10:00:04,849] INFO [ReplicaFetcherThread-0-1], Shutdown
> > > > completed (kafka.server.ReplicaFetcherThread)
> > > > [2015-01-15 10:00:39,256] ERROR Closing socket for /10.0.0.11
> because
> > > > of error (kafka.network.Processor)
> > > > java.io.IOException: Connection reset by peer
> > > > at sun.nio.ch.FileDispatcher.read0(Native Method)
> > > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
> > > > at sun.nio.ch.IOUtil.read(IOUtil.java:171)
> > > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
> > > >
> > > > On Wed, Jan 14, 2015 at 10:12 PM, Tousif <tousif.pasha@gmail.com>
> wrote:
> > > > > Thanks harsha for quick response.
> > > > > I don't see any other error. I used to see replica fetcher error
> but
> > > > seems
> > > > > to be disappeared after setting replica fetcher threads to 2 as I
> have
> > > 2
> > > > > partitions. Some times I see zookeeper session expiration.
> > > > > On Jan 14, 2015 8:02 PM, "Harsha" <kafka@harsha.io> wrote:
> > > > >
> > > > >> Tousif,
> > > > >>        Do you see any other errors in server.log
> > > > >> -Harsha
> > > > >>
> > > > >> On Wed, Jan 14, 2015, at 01:51 AM, Tousif wrote:
> > > > >> > Hello,
> > > > >> >
> > > > >> > I have configured kafka nodes to run via supervisord  and
see
> > > > following
> > > > >> > exceptions
> > > > >> > and eventually brokers going out of memory. i have given
enough
> > > memory
> > > > >> > and
> > > > >> > process 1 event/second. kafka goes down every day
> > > > >> >
> > > > >> > i'm wondering what configurastion is missing or need to
be added
> > > > >> >
> > > > >> > Here are my cluster details:
> > > > >> >  2 brokers
> > > > >> >  1 zookeeper
> > > > >> > and 2 node apache storm
> > > > >> >
> > > > >> >
> > > > >> > INFO zookeeper state changed (SyncConnected)
> > > > >> > (org.I0Itec.zkclient.ZkClient)
> > > > >> > ERROR Closing socket for /10.0.0.11 because of error
> > > > >> > (kafka.network.Processor)
> > > > >> > java.io.IOException: Connection reset by peer
> > > > >> > at sun.nio.ch.FileDispatcher.read0(Native Method)
> > > > >> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > > > >> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
> > > > >> > at sun.nio.ch.IOUtil.read(IOUtil.java:171)
> > > > >> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
> > > > >> > at kafka.utils.Utils$.read(Utils.scala:375)
> > > > >> > at
> > > > >> >
> > > > >>
> > > >
> > >
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
> > > > >> > at kafka.network.Processor.read(SocketServer.scala:347)
> > > > >> > at kafka.network.Processor.run(SocketServer.scala:245)
> > > > >> > at java.lang.Thread.run(Thread.java:662)
> > > > >> > [2015-01-13 23:43:37,962] INFO Closing socket connection
to /
> > > > 10.0.0.11.
> > > > >> > (kafka.network.Processor)
> > > > >> > Error occurred during initialization of VM
> > > > >> > Could not reserve enough space for object heap
> > > > >> > Error occurred during initialization of VM
> > > > >> > Could not reserve enough space for object heap
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Regards,
> > > > >> > Tousif
> > > > >> > +918050227279
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> >
> > > > >> >
> > > > >> > Regards
> > > > >> > Tousif Khazi
> > > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Tousif
> > > > +918050227279
> > > >
> > >
> >
> >
> >
> > --
> >
> >
> > Regards
> > Tousif Khazi
>



-- 


Regards
Tousif Khazi

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message