kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <jun...@gmail.com>
Subject Re: Failed partition reassignment
Date Thu, 04 Dec 2014 00:53:48 GMT
You can do the following (1) check if there is any error in the controller
and the state-change log, (2) use the per partition offset lag JMX in the
follower to see if the follower is making good progress.

Thanks,

Jun

On Tue, Dec 2, 2014 at 3:13 PM, Karol Nowak <grywacz@gmail.com> wrote:

> I don't have it reproduced in a sandbox environment, but it's already
> happened twice on that cluster, so it's a safe bet to say it's reproducible
> in that setup. Are there special metrics / events that I should capture to
> make debugging this easier?
>
>
> Thanks,
> Karol
>
> On Tue, Dec 2, 2014 at 11:20 PM, Jun Rao <junrao@gmail.com> wrote:
>
> > Is there an easy way to reproduce the issues that you saw?
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Dec 1, 2014 at 6:31 AM, Karol Nowak <grywacz@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I observed some error messages / exceptions while running partition
> > > reassignment on kafka 0.8.1.1 cluster. Being fairly new to this system
> > I'm
> > > not sure if these indicate serious failures or transient problems, or
> if
> > > manual intervention is needed.
> > >
> > > I used kafka-reassign-partitions.sh to reassign partitions from brokers
> > > {143,155,155,93} to {143,155,115,68} on a healthy (?) cluster. Right
> now
> > > one partition has just two replicas in the ISR and a number of
> partitions
> > > is left with 4 partitions in ISR even though replication factor is 3.
> > Logs
> > > show a few zookeeper timeouts, but there were no GC pauses anywhere
> near
> > > the session timeout. Zookeeper itself seems healthy and not overloaded,
> > > with exception of regular CPU spikes, probably related to snapshots.
> > >
> > > I cleaned the log lines a little bit for brevity.
> > >
> > > First example: https://gist.github.com/knowak/a682afc1545fdeb836a1
> > > Second one with two similar stack traces:
> > > https://gist.github.com/knowak/6398be433d869d8141e5
> > > Third one, many many of these:
> > > https://gist.github.com/knowak/e78301259b74841702ae
> > > Fourth: https://gist.github.com/knowak/1fbde5ca90d8f1924141
> > > Fifth:https://gist.github.com/knowak/57fdcb75b3dc7c626893
> > >
> > > Hints?
> > >
> > >
> > > Thanks,
> > > Karol
> > >
> >
>
>
>
> --
> pozdrawiam
> Karol Nowak
> http://knowak.wordpress.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message