kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ismael Juma <ism...@juma.me.uk>
Subject Re: Idle cluster high CPU usage
Date Mon, 25 Sep 2017 10:05:02 GMT
Thanks for following up Elliot. Good to know. :)

Ismael

On Mon, Sep 25, 2017 at 10:20 AM, Elliot Crosby-McCullough <
elliot.crosby-mccullough@freeagent.com> wrote:

> We did a bunch of sampling to no particular aid, broadly speaking the
> answer was "it's doing a bunch of talking".
>
> For those who might want to know what this was in the end, during part of
> our previous debugging we enabled `javax.net.debug=all` and didn't twig
> that that had no effect on the log4j logs, and didn't notice the vast
> number of iops to `kafkaServer.out`.  Writing that log was eating all the
> CPU.
>
> On 23 September 2017 at 00:44, jrpilat@gmail.com <jrpilat@gmail.com>
> wrote:
>
> > One thing worth trying is hooking up to 1 or more of the brokers via JMX
> > and examining the running threads;  If that doesn't elucidate the cause,
> > you could move onto sampling or profiling via JMX to see what's taking up
> > all that CPU.
> >
> > - Jordan Pilat
> >
> > On 2017-09-21 07:58, Elliot Crosby-McCullough <elliot.crosby-mccullough@
> > freeagent.com> wrote:
> > > Hello,
> > >
> > > We've been trying to debug an issue with our kafka cluster for several
> > days
> > > now and we're close to out of options.
> > >
> > > We have 3 kafka brokers associated with 3 zookeeper nodes and 3
> registry
> > > nodes, plus a few streams clients and a ruby producer.
> > >
> > > Two of the three brokers are pinning a core and have been for days, no
> > > amount of restarting, debugging, or clearing out of data seems to help.
> > >
> > > We've got the logs at DEBUG level which shows a constant flow much like
> > > this: https://gist.github.com/elliotcm/e66a1ca838558664bab0c91549acb2
> 51
> > >
> > > As best as we can tell the brokers are up to date on replication and
> the
> > > leaders are well-balanced.  The cluster is receiving no traffic; no
> > > messages are being sent in and the consumers/streams are shut down.
> > >
> > > From our profiling of the JVM it looks like the CPU is mostly working
> in
> > > replication threads and SSL traffic (it's a secured cluster) but that
> > > shouldn't be treated as gospel.
> > >
> > > Any advice would be greatly appreciated.
> > >
> > > All the best,
> > > Elliot
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message