kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Headrick <andrew.headr...@gmail.com>
Subject Re: zookeeper session time out
Date Thu, 29 Aug 2013 18:52:22 GMT
I have not run into this issue with Kafka but have definitely run into
issues with ZK expiring sessions and needing to diagnose why. Looking at GC
is obviously very important for this. When you turn on gc logging make sure
that you include a timestamp in the gc.log filename in your start script.
By default the JVM overwrites the gc.log file on startup. I have been
burned by having a restart destroy my gc data. Also, I highly recommend
Censum for analyzing gc log files.


On Thu, Aug 29, 2013 at 7:23 AM, Yu, Libo <libo.yu@citi.com> wrote:

> Thanks for your answer, Neha. Currently we didn't save the GC log.
> I will add that option and keep monitoring the issue.
> Regards,
> Libo
> -----Original Message-----
> From: Neha Narkhede [mailto:neha.narkhede@gmail.com]
> Sent: Wednesday, August 28, 2013 4:25 PM
> To: users@kafka.apache.org
> Subject: Re: zookeeper session time out
> Ah, you maybe hitting the GC due to IO issue. You can confirm if this is
> really the case by looking at the gc.log on the broker and check if you see
> a GC entry with a small user and sys time but high real time. We saw a
> similar IO-causing-GC pauses problem when compressing our request log4j
> files which happens every hour or so. Since these files are large and the
> gzip process hogs the IO bandwidth, the linux box hits the dirty_ratio
> threshold and the kernel stops all threads doing I/O until all the dirty
> pages are flushed to disk. We have seen GC pauses until 15-20 seconds when
> this happens. A workaround is to increase your zookeeper session timeout
> higher to prevent the session expiration and the leader re-elections that
> follow.
> As for your file deletion issue, we have seen that if you configure a
> Kafka broker with time based expiration, it ends up deleting possibly 100s
> of large segment files all at the same time. This puts pressure on file
> system journaling (we are using ext4 in data=ordered mode) and it slows
> down writes on the Kafka side. Kafka should throttle time based rolling as
> well as time based expiration to prevent this situation. With that said, we
> have never really seen this cause a GC pause like the one you described
> though.
> So it will be good to investigate the root cause of your GC pause anyway.
> Could you check your gc.log and send back the relevant part of the log
> that shows the pause?
> Thanks,
> Neha
> On Wed, Aug 28, 2013 at 1:09 PM, Yu, Libo <libo.yu@citi.com> wrote:
> > Hi team,
> >
> > We notice when the incoming throughput is very high, the broker has to
> > delete old log files to free up disk space. That caused some kind of
> > blocking
> > (latency) and
> > frequently the broker's zookeeper session times out. Currently our
> > zookeeper time out threshold is 4s. We can increase it. But if this
> > threshold is too large, what is the consequence? Thanks.
> >
> >
> > Libo
> >
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message