kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <jun...@gmail.com>
Subject Re: Kafka process dies sporadically
Date Mon, 11 Jun 2012 17:33:05 GMT
It's likely that the larger buffer and fetch size triggered long GCs, which
caused ZK session to timeout. You may want to do a bit GC tuning.

Thanks,

Jun

On Mon, Jun 11, 2012 at 10:12 AM, Aaron Rankin <aaron@sproutsocial.com>wrote:

> This might help to explain the root cause. I found that two consumer
> parameters may be correlated with the broker issues. Our setup is
> inter-data-center and so we followed some of the advice on the mirroring
> wiki page,
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring. In
> particular, we increased "socket.buffersize" to 655360 and "fetch.size" to
> 3072000 on our consumers. Since disabling those two parameters, allowing
> the defaults to take effect, our brokers haven't died once in well over a
> day. Prior, they were dying every hour.
>
>
> On Jun 11, 2012, at 10:28 AM, Aaron Rankin wrote:
>
> > Jun,
> >
> > I was using the github mirror, which appears to be active. The last
> commit there is the same as with the Apache Git mirror
> (2a59ad76c657e4aad8ee6ca67078f49d2f6017c9).
> >
> >
> > Aaron
> >
> >
> > On Jun 11, 2012, at 12:05 AM, Jun Rao wrote:
> >
> >> Aaron,
> >>
> >> Which Git did you try, github or the Apache git mirror? Kafka has moved
> to
> >> Apache. So please try the 0.7 release in Apache.
> >>
> >> The error you saw are from ZK. Do you see lots of ZK session expiration
> in
> >> your log?
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Sat, Jun 9, 2012 at 8:34 AM, Aaron Rankin <aaron@sproutsocial.com>
> wrote:
> >>
> >>> Hi,
> >>>
> >>> We're testing Kafka and have found that the process dies often. There's
> >>> little to no indication of why. We're running the latest code from Git,
> >>> which we built using the instructions there. We're also running
> Zookeeper
> >>> 3.3.5. Our setup has three brokers, producers running on the same
> network
> >>> and consumers in another data center, a 30ms Internet ping away.
> >>>
> >>> Does anyone have some intuition about why this is happening?
> >>>
> >>> The only stack trace we're seeing is coming from Zookeeper:
> >>>
> >>> 1193285089 [CommitProcessor:2] ERROR
> >>> org.apache.zookeeper.server.NIOServerCnxn  - Unexpected Exception:
> >>> java.nio.channels.CancelledKeyException
> >>>      at
> sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
> >>>      at
> sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
> >>>      at
> >>>
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
> >>>      at
> >>>
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
> >>>      at
> >>>
> org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1545)
> >>>      at
> >>>
> org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:115)
> >>>      at
> >>>
> org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:87)
> >>>      at
> >>> org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:577)
> >>>      at
> >>> org.apache.zookeeper.server.DataTree.killSession(DataTree.java:829)
> >>>      at
> >>> org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:804)
> >>>      at
> >>> org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:328)
> >>>      at
> >>>
> org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:715)
> >>>      at
> >>>
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:107)
> >>>      at
> >>>
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
> >>>
> >>>
> >>> Also, we constantly are seeing these in the logs:
> >>>
> >>> 1193365748 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] INFO
> >>> org.apache.zookeeper.server.NIOServerCnxn  - Closed socket connection
> for
> >>> client /127.0.0.1:53426 (no session established for client)
> >>> 1193425755 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] INFO
> >>> org.apache.zookeeper.server.NIOServerCnxn  - Accepted socket connection
> >>> from /127.0.0.1:53428
> >>> 1193425755 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] WARN
> >>> org.apache.zookeeper.server.NIOServerCnxn  - EndOfStreamException:
> Unable
> >>> to read additional data from client sessionid 0x0, likely client has
> closed
> >>> socket
> >>>
> >>>
> >>>
> >>>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message