kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Yost <hokiege...@gmail.com>
Subject Re: Idle cluster high CPU usage
Date Thu, 21 Sep 2017 15:44:43 GMT
The only thing I can think of is message format...do the client and broker
versions match? If the clients are a lower version than brokers (i.e.,
0.9.0.1 client, 0.10.0.1 broker), then I think there could be message
format conversions both for incoming messages as well as for replication.

--John

On Thu, Sep 21, 2017 at 10:42 AM, Elliot Crosby-McCullough <
elliot.crosby-mccullough@freeagent.com> wrote:

> Nothing, that value (that group of values) was at default when we started
> the debugging.
>
> On 21 September 2017 at 15:08, Ismael Juma <ismael@juma.me.uk> wrote:
>
> > Thanks. What happens if you reduce num.replica.fetchers?
> >
> > On Thu, Sep 21, 2017 at 3:02 PM, Elliot Crosby-McCullough <
> > elliot.crosby-mccullough@freeagent.com> wrote:
> >
> > > 551 partitions, broker configs are:
> > > https://gist.github.com/elliotcm/3a35f66377c2ef4020d76508f49f106b
> > >
> > > We tweaked it a bit from standard recently but that was as part of the
> > > debugging process.
> > >
> > > After some more experimentation I'm seeing the same behaviour at about
> > half
> > > the CPU after creating one 50 partition topic in an otherwise empty
> > > cluster.
> > >
> > > On 21 September 2017 at 14:20, Ismael Juma <ismael@juma.me.uk> wrote:
> > >
> > > > A couple of questions: how many partitions in the cluster and what
> are
> > > your
> > > > broker configs?
> > > >
> > > > On Thu, Sep 21, 2017 at 1:58 PM, Elliot Crosby-McCullough <
> > > > elliot.crosby-mccullough@freeagent.com> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > We've been trying to debug an issue with our kafka cluster for
> > several
> > > > days
> > > > > now and we're close to out of options.
> > > > >
> > > > > We have 3 kafka brokers associated with 3 zookeeper nodes and 3
> > > registry
> > > > > nodes, plus a few streams clients and a ruby producer.
> > > > >
> > > > > Two of the three brokers are pinning a core and have been for days,
> > no
> > > > > amount of restarting, debugging, or clearing out of data seems to
> > help.
> > > > >
> > > > > We've got the logs at DEBUG level which shows a constant flow much
> > like
> > > > > this: https://gist.github.com/elliotcm/
> > e66a1ca838558664bab0c91549acb2
> > > 51
> > > > >
> > > > > As best as we can tell the brokers are up to date on replication
> and
> > > the
> > > > > leaders are well-balanced.  The cluster is receiving no traffic;
no
> > > > > messages are being sent in and the consumers/streams are shut down.
> > > > >
> > > > > From our profiling of the JVM it looks like the CPU is mostly
> working
> > > in
> > > > > replication threads and SSL traffic (it's a secured cluster) but
> that
> > > > > shouldn't be treated as gospel.
> > > > >
> > > > > Any advice would be greatly appreciated.
> > > > >
> > > > > All the best,
> > > > > Elliot
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message