kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Berdeen <rberd...@hubspot.com>
Subject Re: Controlled shutdown and leader election issues
Date Mon, 07 Apr 2014 20:50:08 GMT
I think I've figured it out, and it still happens in the 0.8.1 branch. The
code that is responsible for deleting the key from ZooKeeper is broken and
will never be called when using the command line tool, so it will fail
after the first use. I''ve created
https://issues.apache.org/jira/browse/KAFKA-1365.


On Fri, Apr 4, 2014 at 2:13 AM, Clark Breyman <clark@breyman.com> wrote:

> Done. https://issues.apache.org/jira/browse/KAFKA-1360
>
>
> On Thu, Apr 3, 2014 at 9:13 PM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > >> Is there a maven repo for pulling snapshot CI builds from?
> >
> > We still need to get the CI build setup going, could you please file a
> JIRA
> > for this?
> > Meanwhile, you will have to just build the code yourself for now,
> > unfortunately.
> >
> > Thanks,
> > Neha
> >
> >
> > On Thu, Apr 3, 2014 at 12:01 PM, Clark Breyman <clark@breyman.com>
> wrote:
> >
> > > Thank Neha - Is there a maven repo for pulling snapshot CI builds from?
> > > Sorry if this is answered elsewhere.
> > >
> > >
> > > On Wed, Apr 2, 2014 at 7:16 PM, Neha Narkhede <neha.narkhede@gmail.com
> > > >wrote:
> > >
> > > > I'm not so sure if I know the issue you are running into but we
> fixed a
> > > few
> > > > bugs with similar symptoms and the fixes are on the 0.8.1 branch. It
> > will
> > > > be great if you give it a try to see if your issue is resolved.
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > >
> > > > On Wed, Apr 2, 2014 at 12:59 PM, Clark Breyman <clark@breyman.com>
> > > wrote:
> > > >
> > > > > Was there an answer for 0.8.1 getting stuck in preferred leader
> > > election?
> > > > > I'm seeing this as well. Is there a JIRA ticket on this issue?
> > > > >
> > > > >
> > > > > On Fri, Mar 21, 2014 at 1:15 PM, Ryan Berdeen <
> rberdeen@hubspot.com>
> > > > > wrote:
> > > > >
> > > > > > So, for 0.8 without "controlled.shutdown.enable", why would
> > > > > ShutdownBroker
> > > > > > and restarting cause under-replication and producer exceptions?
> How
> > > > can I
> > > > > > upgrade gracefully?
> > > > > >
> > > > > > What's up with 0.8.1 getting stuck in preferred leader election?
> > > > > >
> > > > > >
> > > > > > On Fri, Mar 21, 2014 at 12:18 AM, Neha Narkhede <
> > > > neha.narkhede@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Which brings up the question - Do we need ShutdownBroker
> anymore?
> > > It
> > > > > > seems
> > > > > > > like the config should handle controlled shutdown correctly
> > anyway.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Neha
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Mar 20, 2014 at 9:16 PM, Jun Rao <junrao@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > We haven't been testing the ShutdownBroker command
in 0.8.1
> > > > > rigorously
> > > > > > > > since in 0.8.1, one can do the controlled shutdown
through
> the
> > > new
> > > > > > config
> > > > > > > > "controlled.shutdown.enable". Instead of running the
> > > ShutdownBroker
> > > > > > > command
> > > > > > > > during the upgrade, you can also wait until under
replicated
> > > > > partition
> > > > > > > > count drops to 0 after each restart before moving
to the next
> > > one.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Mar 20, 2014 at 3:14 PM, Ryan Berdeen <
> > > > rberdeen@hubspot.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > While upgrading from 0.8.0 to 0.8.1 in place,
I observed
> some
> > > > > > > surprising
> > > > > > > > > behavior using kafka.admin.ShutdownBroker. At
the start,
> > there
> > > > were
> > > > > > no
> > > > > > > > > underreplicated partitions. After running
> > > > > > > > >
> > > > > > > > >   bin/kafka-run-class.sh kafka.admin.ShutdownBroker
> --broker
> > 10
> > > > ...
> > > > > > > > >
> > > > > > > > > Partitions that had replicas on broker 10 were
> > > under-replicated:
> > > > > > > > >
> > > > > > > > >   bin/kafka-topics.sh --describe
> > --under-replicated-partitions
> > > > ...
> > > > > > > > >   Topic: analytics-activity Partition: 2  Leader:
12
> >  Replicas:
> > > > > 12,10
> > > > > > > > Isr:
> > > > > > > > > 12
> > > > > > > > >   Topic: analytics-activity Partition: 6  Leader:
11
> >  Replicas:
> > > > > 11,10
> > > > > > > > Isr:
> > > > > > > > > 11
> > > > > > > > >   Topic: analytics-activity Partition: 14 Leader:
14
> >  Replicas:
> > > > > 14,10
> > > > > > > > Isr:
> > > > > > > > > 14
> > > > > > > > >   ...
> > > > > > > > >
> > > > > > > > > While restarting the broker process, many produce
requests
> > > failed
> > > > > > with
> > > > > > > > > kafka.common.UnknownTopicOrPartitionException.
> > > > > > > > >
> > > > > > > > > After each broker restart, I used the preferred
leader
> > election
> > > > > tool
> > > > > > > for
> > > > > > > > > all topics. Now, after finishing all of the broker
> restarts,
> > > the
> > > > > > > cluster
> > > > > > > > > seems to be stuck in leader election. Running
the tool
> fails
> > > with
> > > > > > > > > "kafka.admin.AdminOperationException: Preferred
replica
> > leader
> > > > > > election
> > > > > > > > > currently in progress..."
> > > > > > > > >
> > > > > > > > > Are any of these known issues? Is there a safer
way to
> > shutdown
> > > > and
> > > > > > > > restart
> > > > > > > > > brokers that does not cause producer failures
and
> > > > under-replicated
> > > > > > > > > partitions?
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message