kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Michalski <mmichal...@tagged.com>
Subject Re: Upgrading from 0.8.0 to 0.8.1 one broker at a time issues
Date Fri, 11 Apr 2014 02:34:49 GMT
I see that the state-change logs have warning messages of this kind (Broker
7 is the 0.8.1 API and this is a log snippet from that broker) :
s associated leader epoch 11 is old. Current leader epoch is 11
(state.change.logger)
[2014-04-09 10:32:21,974] WARN Broker 7 ignoring LeaderAndIsr request from
controller 1001 with correlation id 0 epoch 7 for partition
[pets_nec_buygold,0] since its asso
ciated leader epoch 12 is old. Current leader epoch is 12
(state.change.logger)
[2014-04-09 10:32:21,974] WARN Broker 7 ignoring LeaderAndIsr request from
controller 1001 with correlation id 0 epoch 7 for partition
[cafe_notification,0] since its ass
ociated leader epoch 11 is old. Current leader epoch is 11
(state.change.logger)
[2014-04-09 10:32:21,975] INFO Broker 7 skipped the become-follower state
change after marking its partition as follower with correlation id 0 from
controller 1001 epoch
6 for partition [set_primary_photo,0] since the new leader 1008 is the same
as the old leader (state.change.logger)
[2014-04-09 10:32:21,975] INFO Broker 7 skipped the become-follower state
change after marking its partition as follower with correlation id 0 from
controller 1001 epoch
6 for partition [external_url,0] since the new leader 1001 is the same as
the old leader (state.change.logger)

And these are the snippets of the broker log of a 0.8.0 node that I shut
down before I tried to upgrade it (this is when most topics became
unusable):

[2014-04-09 10:32:21,993] WARN Broker 8 ignoring LeaderAndIsr request from
controller 1001 with correlation id 0 epoch 7 for partition
[variant_assign,0] since its associated leader epoch 11 is old. Current
leader epoch is 11 (state.change.logger)
[2014-04-09 10:32:21,993] WARN Broker 8 ignoring LeaderAndIsr request from
controller 1001 with correlation id 0 epoch 7 for partition
[meetme_new_contact_count,0] since its associated leader epoch 8 is old.
Current leader epoch is 8 (state.change.logger)
[2014-04-09 10:32:21,994] INFO Broker 8 skipped the become-follower state
change after marking its partition as follower with correlation id 0 from
controller 1001 epoch 6 for partition [m3_auth,0] since the new leader 7 is
the same as the old leader (state.change.logger)
[2014-04-09 10:32:21,994] INFO Broker 8 skipped the become-follower state
change after marking its partition as follower with correlation id 0 from
controller 1001 epoch 6 for partition [newsfeed_likes,0] since the new
leader 1001 is the same as the old leader (state.change.logger)

In terms of upgrading from 0.8.0 to 0.8.1 is there a recommended approach
that one should follow? Is it possible to migrate from one version to the
next one on a live cluster one server a time?

Thanks,
Martin


On Wed, Apr 9, 2014 at 8:38 PM, Jun Rao <junrao@gmail.com> wrote:

> Was there any error in the controller and the state-change logs?
>
> Thanks,
>
> Jun
>
>
> On Wed, Apr 9, 2014 at 11:18 AM, Marcin Michalski <mmichalski@tagged.com
> >wrote:
>
> > Hi, has anyone upgraded their kafka from 0.8.0 to 0.8.1 successfully one
> > broker at a time on a live cluster?
> >
> > I am seeing strange behaviors where many of my kafka topics become
> unusable
> > (by both consumers and producers). When that happens, I see lots of
> errors
> > in the server logs that look like this:
> >
> > [2014-04-09 10:38:14,669] WARN [KafkaApi-1007] Fetch request with
> > correlation id 2455 from client ReplicaFetcherThread-15-1007 on partition
> > [risk,0] failed due to Topic risk either doesn't exist or is in the
> process
> > of being deleted (kafka.server.KafkaApis)
> > [2014-04-09 10:38:14,669] WARN [KafkaApi-1007] Fetch request with
> > correlation id 2455 from client ReplicaFetcherThread-7-1007 on partition
> > [message,0] failed due to Topic message either doesn't exist or is in the
> > process of being deleted (kafka.server.KafkaApis)
> >
> > When I try to consume a message from a topic that complained about the
> > Topic not existing (above warning), I get the below exception:
> >
> > ....topic message --from-beginning
> > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> > SLF4J: Defaulting to no-operation (NOP) logger implementation
> > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
> further
> > details.
> > [2014-04-09 10:40:30,571] WARN
> >
> >
> [console-consumer-90716_dkafkadatahub07.tag-dev.com-1397065229615-7211ba72-leader-finder-thread],
> > Failed to add leader for partitions [message,0]; will retry
> > (kafka.consumer.ConsumerFetcherManager$LeaderFinderThread)
> > kafka.common.UnknownTopicOrPartitionException
> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> > at
> >
> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> > at
> >
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> > at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> > at java.lang.Class.newInstance0(Class.java:355)
> > at java.lang.Class.newInstance(Class.java:308)
> > at kafka.common.ErrorMapping$.exceptionFor(ErrorMapping.scala:79)
> > at
> >
> >
> kafka.consumer.SimpleConsumer.earliestOrLatestOffset(SimpleConsumer.scala:167)
> > at
> >
> >
> kafka.consumer.ConsumerFetcherThread.handleOffsetOutOfRange(ConsumerFetcherThread.scala:60)
> > at
> >
> >
> kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply(AbstractFetcherThread.scala:179)
> > at
> >
> >
> kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply(AbstractFetcherThread.scala:174)
> > at scala.collection.immutable.Map$Map1.foreach(Map.scala:119)
> > at
> >
> >
> kafka.server.AbstractFetcherThread.addPartitions(AbstractFetcherThread.scala:174)
> > at
> >
> >
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:86)
> > at
> >
> >
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:76)
> > at scala.collection.immutable.Map$Map1.foreach(Map.scala:119)
> > at
> >
> >
> kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:76)
> > at
> >
> >
> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:95)
> > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
> > ----------
> >
> > *More details about my issues:*
> > My current configuration in the environment where I am testing the
> upgrade
> > is 4 physical servers running 2 brokers each with controlled shutdown
> > feature enabled. When I shutdown the 2 brokers on one of the existing
> Kafka
> > 0.8.0 machines and upgrade that machine to 0.8.1 and restart it, all is
> > fine for a bit. Once, the new brokers come up, I ran the
> > kafka-preferred-replica-election.sh to make sure that started brokers
> > become leaders of existing topics.  The replication factor on the topics
> is
> > set to 4. I tested both producing and consuming messages against brokers
> > that were leaders with kafka 0.8.0 and 0.8.1 and no issues were
> > encountered.
> >
> > Later, I tried to perform the control shutdown of the 2 additional
> brokers
> > on the Kafka server that has 0.8.0 version installed and after the broker
> > shutdown and new leaders were assigned, all of my server logs are getting
> > filled up with the above exceptions and most of my topics are not
> usable. I
> > have pulled and build the 0.8.1 kafka code from git last thursday so I
> > should be pretty much up to date. So not sure if I am doing something
> wrong
> > or if migrating from 0.8.0 to 0.8.1 on a live cluster one server at a
> time
> > is not supported. Is there a recommended migration approach that one
> should
> > take when migrating from live 0.8.0 to 0.8.1 cluster?
> >
> > As to who is the leader of one of the topics that became unusable is the
> > broker that was successfully upgraded to 0.8.1:
> > Topic:message   PartitionCount:1        ReplicationFactor:4     Configs:
> >         Topic: message  Partition: 0   * Leader: 1007 *   Replicas:
> > 1007,8,9,1001 Isr: 1001,1007,8
> >
> > Brokers 9 and 1009 where shutdown from one physical server that had kafka
> > 0.8.0 installed when these problems started occurring (I was planning to
> > upgrade them to 0.8.1). The only way I can recover from this state is to
> > shutdown all brokers and delete all of kafka topic logs plus zookeeper
> > kafka directory and start with new cluster.
> >
> >
> > Your help in this matter is greatly appreciated.
> >
> > Thanks,
> > Martin
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message