kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: Migrating a cluster from 0.8.0 to 0.8.1
Date Mon, 23 Dec 2013 19:24:32 GMT
Hi Drew,

I tried the kafka-server-stop script and it worked for me. Wondering which
OS are you using?

Guozhang


On Mon, Dec 23, 2013 at 10:57 AM, Drew Goya <drew@gradientx.com> wrote:

> Occasionally I do have to hard kill brokers, the kafka-server-stop.sh
> script stopped working for me a few months ago.  I saw another thread in
> the mailing list mentioning the issue too.  I'll change the signal back to
> SIGTERM and run that way for a while, hopefully the problem goes away.
>
> This is the commit where it changed:
>
>
> https://github.com/apache/kafka/commit/51de7c55d2b3107b79953f401fc8c9530bd0eea0
>
>
> On Mon, Dec 23, 2013 at 10:09 AM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > Are you hard killing the brokers? And is this issue reproducible?
> >
> >
> > On Sat, Dec 21, 2013 at 11:39 AM, Drew Goya <drew@gradientx.com> wrote:
> >
> > > Hey guys, another small issue to report for 0.8.1.  After a couple
> days 3
> > > of my brokers had fallen off the ISR list for a 2-3 of their
> partitions.
> > >
> > > I didn't see anything unusual in the log and I just restarted one.  It
> > came
> > > up fine but as it loaded its logs I these messages showed up:
> > >
> > > [2013-12-21 19:25:19,968] WARN [ReplicaFetcherThread-0-2], Replica 1
> for
> > > partition [Events2,58] reset its fetch offset to current leader 2's
> start
> > > offset 1042738519 (kafka.server.ReplicaFetcherThread)
> > > [2013-12-21 19:25:19,969] WARN [ReplicaFetcherThread-0-14], Replica 1
> for
> > > partition [Events2,28] reset its fetch offset to current leader 14's
> > start
> > > offset 1043415514 (kafka.server.ReplicaFetcherThread)
> > > [2013-12-21 19:25:20,012] WARN [ReplicaFetcherThread-0-2], Current
> offset
> > > 1011209589 for partition [Events2,58] out of range; reset offset to
> > > 1042738519 (kafka.server.ReplicaFetcherThread)
> > > [2013-12-21 19:25:20,013] WARN [ReplicaFetcherThread-0-14], Current
> > offset
> > > 1010086751 for partition [Events2,28] out of range; reset offset to
> > > 1043415514 (kafka.server.ReplicaFetcherThread)
> > > [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-14], Replica 1
> for
> > > partition [Events2,71] reset its fetch offset to current leader 14's
> > start
> > > offset 1026871415 (kafka.server.ReplicaFetcherThread)
> > > [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-2], Replica 1
> for
> > > partition [Events2,44] reset its fetch offset to current leader 2's
> start
> > > offset 1052372907 (kafka.server.ReplicaFetcherThread)
> > > [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-14], Current
> > offset
> > > 993879706 for partition [Events2,71] out of range; reset offset to
> > > 1026871415 (kafka.server.ReplicaFetcherThread)
> > > [2013-12-21 19:25:20,036] WARN [ReplicaFetcherThread-0-2], Current
> offset
> > > 1020715056 for partition [Events2,44] out of range; reset offset to
> > > 1052372907 (kafka.server.ReplicaFetcherThread)
> > >
> > > Judging by the network traffic and disk usage changes after the reboot
> > > (both jumped up) a couple of the partition replicas had fallen behind
> and
> > > are now catching up.
> > >
> > >
> > > On Thu, Dec 19, 2013 at 4:37 PM, Neha Narkhede <
> neha.narkhede@gmail.com
> > > >wrote:
> > >
> > > > Hi Drew,
> > > >
> > > > That problem will be fixed by
> > > > https://issues.apache.org/jira/browse/KAFKA-1074. I think we are
> close
> > > to
> > > > checking that in to trunk.
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > >
> > > > On Wed, Dec 18, 2013 at 9:02 AM, Drew Goya <drew@gradientx.com>
> wrote:
> > > >
> > > > > Thanks Neha, I rolled upgrades and completed a rebalance!
> > > > >
> > > > > I ran into a few small issues I figured I would share.
> > > > >
> > > > > On a few Brokers, there were some log directories left over from
> some
> > > > > failed rebalances which prevented the 0.8.1 brokers from starting
> > once
> > > I
> > > > > completed the upgrade.  These directories contained an index file
> > and a
> > > > > zero size log file, once I cleaned those out the brokers were able
> to
> > > > start
> > > > > up fine.  If anyone else runs into the same problem, and is running
> > > RHEL,
> > > > > this is the bash script I used to clean them out:
> > > > >
> > > > > du --max-depth=1 -h /data/kafka/logs | grep K | sed s/.*K.// | sudo
> > rm
> > > -r
> > > > >
> > > > >
> > > > > On Tue, Dec 17, 2013 at 10:42 AM, Neha Narkhede <
> > > neha.narkhede@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > There are no compatibility issues. You can roll upgrades through
> > the
> > > > > > cluster one node at a time.
> > > > > >
> > > > > > Thanks
> > > > > > Neha
> > > > > >
> > > > > >
> > > > > > On Tue, Dec 17, 2013 at 9:15 AM, Drew Goya <drew@gradientx.com>
> > > wrote:
> > > > > >
> > > > > > > So I'm going to be going through the process of upgrading
a
> > cluster
> > > > > from
> > > > > > > 0.8.0 to the trunk (0.8.1).
> > > > > > >
> > > > > > > I'm going to be expanding this cluster several times and
the
> > > problems
> > > > > > with
> > > > > > > reassigning partitions in 0.8.0 mean I have to move to
> > trunk(0.8.1)
> > > > > asap.
> > > > > > >
> > > > > > > Will it be safe to roll upgrades through the cluster one
by
> one?
> > > > > > >
> > > > > > > Also are there any client compatibility issues I need to
worry
> > > about?
> > > > > >  Am I
> > > > > > > going to need to pause/upgrade all my consumers/producers
at
> once
> > > or
> > > > > can
> > > > > > I
> > > > > > > roll upgrades through the cluster and then upgrade my clients
> one
> > > by
> > > > > one?
> > > > > > >
> > > > > > > Thanks in advance!
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message