kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <jun...@gmail.com>
Subject Re: Upgrading from 0.8.0 to 0.8.1 one broker at a time issues
Date Fri, 11 Apr 2014 03:19:24 GMT
One should be able to upgrade from 0.8 to 0.8.1 one broker at a time
online. There are some corner cases that we are trying to patch in 0.8.1.1,
which will be released soon.

As for your issue, not sure what happened. Do you see any ZK session
expirations in the broker log?

Thanks,

Jun


On Thu, Apr 10, 2014 at 7:34 PM, Marcin Michalski <mmichalski@tagged.com>wrote:

> I see that the state-change logs have warning messages of this kind (Broker
> 7 is the 0.8.1 API and this is a log snippet from that broker) :
> s associated leader epoch 11 is old. Current leader epoch is 11
> (state.change.logger)
> [2014-04-09 10:32:21,974] WARN Broker 7 ignoring LeaderAndIsr request from
> controller 1001 with correlation id 0 epoch 7 for partition
> [pets_nec_buygold,0] since its asso
> ciated leader epoch 12 is old. Current leader epoch is 12
> (state.change.logger)
> [2014-04-09 10:32:21,974] WARN Broker 7 ignoring LeaderAndIsr request from
> controller 1001 with correlation id 0 epoch 7 for partition
> [cafe_notification,0] since its ass
> ociated leader epoch 11 is old. Current leader epoch is 11
> (state.change.logger)
> [2014-04-09 10:32:21,975] INFO Broker 7 skipped the become-follower state
> change after marking its partition as follower with correlation id 0 from
> controller 1001 epoch
> 6 for partition [set_primary_photo,0] since the new leader 1008 is the same
> as the old leader (state.change.logger)
> [2014-04-09 10:32:21,975] INFO Broker 7 skipped the become-follower state
> change after marking its partition as follower with correlation id 0 from
> controller 1001 epoch
> 6 for partition [external_url,0] since the new leader 1001 is the same as
> the old leader (state.change.logger)
>
> And these are the snippets of the broker log of a 0.8.0 node that I shut
> down before I tried to upgrade it (this is when most topics became
> unusable):
>
> [2014-04-09 10:32:21,993] WARN Broker 8 ignoring LeaderAndIsr request from
> controller 1001 with correlation id 0 epoch 7 for partition
> [variant_assign,0] since its associated leader epoch 11 is old. Current
> leader epoch is 11 (state.change.logger)
> [2014-04-09 10:32:21,993] WARN Broker 8 ignoring LeaderAndIsr request from
> controller 1001 with correlation id 0 epoch 7 for partition
> [meetme_new_contact_count,0] since its associated leader epoch 8 is old.
> Current leader epoch is 8 (state.change.logger)
> [2014-04-09 10:32:21,994] INFO Broker 8 skipped the become-follower state
> change after marking its partition as follower with correlation id 0 from
> controller 1001 epoch 6 for partition [m3_auth,0] since the new leader 7 is
> the same as the old leader (state.change.logger)
> [2014-04-09 10:32:21,994] INFO Broker 8 skipped the become-follower state
> change after marking its partition as follower with correlation id 0 from
> controller 1001 epoch 6 for partition [newsfeed_likes,0] since the new
> leader 1001 is the same as the old leader (state.change.logger)
>
> In terms of upgrading from 0.8.0 to 0.8.1 is there a recommended approach
> that one should follow? Is it possible to migrate from one version to the
> next one on a live cluster one server a time?
>
> Thanks,
> Martin
>
>
> On Wed, Apr 9, 2014 at 8:38 PM, Jun Rao <junrao@gmail.com> wrote:
>
> > Was there any error in the controller and the state-change logs?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Apr 9, 2014 at 11:18 AM, Marcin Michalski <mmichalski@tagged.com
> > >wrote:
> >
> > > Hi, has anyone upgraded their kafka from 0.8.0 to 0.8.1 successfully
> one
> > > broker at a time on a live cluster?
> > >
> > > I am seeing strange behaviors where many of my kafka topics become
> > unusable
> > > (by both consumers and producers). When that happens, I see lots of
> > errors
> > > in the server logs that look like this:
> > >
> > > [2014-04-09 10:38:14,669] WARN [KafkaApi-1007] Fetch request with
> > > correlation id 2455 from client ReplicaFetcherThread-15-1007 on
> partition
> > > [risk,0] failed due to Topic risk either doesn't exist or is in the
> > process
> > > of being deleted (kafka.server.KafkaApis)
> > > [2014-04-09 10:38:14,669] WARN [KafkaApi-1007] Fetch request with
> > > correlation id 2455 from client ReplicaFetcherThread-7-1007 on
> partition
> > > [message,0] failed due to Topic message either doesn't exist or is in
> the
> > > process of being deleted (kafka.server.KafkaApis)
> > >
> > > When I try to consume a message from a topic that complained about the
> > > Topic not existing (above warning), I get the below exception:
> > >
> > > ....topic message --from-beginning
> > > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> > > SLF4J: Defaulting to no-operation (NOP) logger implementation
> > > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
> > further
> > > details.
> > > [2014-04-09 10:40:30,571] WARN
> > >
> > >
> >
> [console-consumer-90716_dkafkadatahub07.tag-dev.com-1397065229615-7211ba72-leader-finder-thread],
> > > Failed to add leader for partitions [message,0]; will retry
> > > (kafka.consumer.ConsumerFetcherManager$LeaderFinderThread)
> > > kafka.common.UnknownTopicOrPartitionException
> > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> > > at
> > >
> > >
> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> > > at
> > >
> > >
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> > > at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> > > at java.lang.Class.newInstance0(Class.java:355)
> > > at java.lang.Class.newInstance(Class.java:308)
> > > at kafka.common.ErrorMapping$.exceptionFor(ErrorMapping.scala:79)
> > > at
> > >
> > >
> >
> kafka.consumer.SimpleConsumer.earliestOrLatestOffset(SimpleConsumer.scala:167)
> > > at
> > >
> > >
> >
> kafka.consumer.ConsumerFetcherThread.handleOffsetOutOfRange(ConsumerFetcherThread.scala:60)
> > > at
> > >
> > >
> >
> kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply(AbstractFetcherThread.scala:179)
> > > at
> > >
> > >
> >
> kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply(AbstractFetcherThread.scala:174)
> > > at scala.collection.immutable.Map$Map1.foreach(Map.scala:119)
> > > at
> > >
> > >
> >
> kafka.server.AbstractFetcherThread.addPartitions(AbstractFetcherThread.scala:174)
> > > at
> > >
> > >
> >
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:86)
> > > at
> > >
> > >
> >
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:76)
> > > at scala.collection.immutable.Map$Map1.foreach(Map.scala:119)
> > > at
> > >
> > >
> >
> kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:76)
> > > at
> > >
> > >
> >
> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:95)
> > > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
> > > ----------
> > >
> > > *More details about my issues:*
> > > My current configuration in the environment where I am testing the
> > upgrade
> > > is 4 physical servers running 2 brokers each with controlled shutdown
> > > feature enabled. When I shutdown the 2 brokers on one of the existing
> > Kafka
> > > 0.8.0 machines and upgrade that machine to 0.8.1 and restart it, all is
> > > fine for a bit. Once, the new brokers come up, I ran the
> > > kafka-preferred-replica-election.sh to make sure that started brokers
> > > become leaders of existing topics.  The replication factor on the
> topics
> > is
> > > set to 4. I tested both producing and consuming messages against
> brokers
> > > that were leaders with kafka 0.8.0 and 0.8.1 and no issues were
> > > encountered.
> > >
> > > Later, I tried to perform the control shutdown of the 2 additional
> > brokers
> > > on the Kafka server that has 0.8.0 version installed and after the
> broker
> > > shutdown and new leaders were assigned, all of my server logs are
> getting
> > > filled up with the above exceptions and most of my topics are not
> > usable. I
> > > have pulled and build the 0.8.1 kafka code from git last thursday so I
> > > should be pretty much up to date. So not sure if I am doing something
> > wrong
> > > or if migrating from 0.8.0 to 0.8.1 on a live cluster one server at a
> > time
> > > is not supported. Is there a recommended migration approach that one
> > should
> > > take when migrating from live 0.8.0 to 0.8.1 cluster?
> > >
> > > As to who is the leader of one of the topics that became unusable is
> the
> > > broker that was successfully upgraded to 0.8.1:
> > > Topic:message   PartitionCount:1        ReplicationFactor:4
> Configs:
> > >         Topic: message  Partition: 0   * Leader: 1007 *   Replicas:
> > > 1007,8,9,1001 Isr: 1001,1007,8
> > >
> > > Brokers 9 and 1009 where shutdown from one physical server that had
> kafka
> > > 0.8.0 installed when these problems started occurring (I was planning
> to
> > > upgrade them to 0.8.1). The only way I can recover from this state is
> to
> > > shutdown all brokers and delete all of kafka topic logs plus zookeeper
> > > kafka directory and start with new cluster.
> > >
> > >
> > > Your help in this matter is greatly appreciated.
> > >
> > > Thanks,
> > > Martin
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message