kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charity Majors <char...@hound.sh>
Subject Re: kafka + autoscaling groups fuckery
Date Sun, 03 Jul 2016 18:29:53 GMT
Great talks, but not relevant to either of my problems -- the golang client
not rebalancing the consumer offset topic, or autoscaling group behavior
(which is I think is probably just a consequence of the first).

Thanks though, there's good stuff in here.

On Sun, Jul 3, 2016 at 10:23 AM, James Cheng <wushujames@gmail.com> wrote:

> Charity,
>
> I'm not sure about the specific problem you are having, but about Kafka on
> AWS, Netflix did a talk at a meetup about their Kafka installation on AWS.
> There might be some useful information in there. There is a video stream as
> well as slides, and maybe you can get in touch with the speakers. Look in
> the comment section for links to the slides and video.
>
> Kafka at Netflix
>
> http://www.meetup.com//http-kafka-apache-org/events/220355031/?showDescription=true
>
> There's also a talk about running Kafka on Mesos, which might be relevant.
>
> Kafka on Mesos
>
> http://www.meetup.com//http-kafka-apache-org/events/222537743/?showDescription=true
>
> -James
>
> Sent from my iPhone
>
> > On Jul 2, 2016, at 5:15 PM, Charity Majors <charity@hound.sh> wrote:
> >
> > Gwen, thanks for the response.
> >
> > 1.1 Your life may be a bit simpler if you have a way of starting a new
> >
> >> broker with the same ID as the old one - this means it will
> >> automatically pick up the old replicas and you won't need to
> >> rebalance. Makes life slightly easier in some cases.
> >
> > Yeah, this is definitely doable, I just don't *want* to do it.  I really
> > want all of these to share the same code path: 1) rolling all nodes in an
> > ASG to pick up a new AMI, 2) hardware failure / unintentional node
> > termination, 3) resizing the ASG and rebalancing the data across nodes.
> >
> > Everything but the first one means generating new node IDs, so I would
> > rather just do that across the board.  It's the solution that really fits
> > the ASG model best, so I'm reluctant to give up on it.
> >
> >
> >> 1.2 Careful not too rebalance too many partitions at once - you only
> >> have so much bandwidth and currently Kafka will not throttle
> >> rebalancing traffic.
> >
> > Nod, got it.  This is def something I plan to work on hardening once I
> have
> > the basic nut of things working (or if I've had to give up on it and
> accept
> > a lesser solution).
> >
> >
> >> 2. I think your rebalance script is not rebalancing the offsets topic?
> >> It still has a replica on broker 1002. You have two good replicas, so
> >> you are no where near disaster, but make sure you get this working
> >> too.
> >
> > Yes, this is another problem I am working on in parallel.  The Shopify
> > sarama library <https://godoc.org/github.com/Shopify/sarama> uses the
> > __consumer_offsets topic, but it does *not* let you rebalance or resize
> the
> > topic when consumers connect, disconnect, or restart.
> >
> > "Note that Sarama's Consumer implementation does not currently support
> > automatic consumer-group rebalancing and offset tracking"
> >
> > I'm working on trying to get the sarama-cluster to do something here.  I
> > think these problems are likely related, I'm not sure wtf you are
> > *supposed* to do to rebalance this god damn topic.  It also seems like we
> > aren't using a consumer group which sarama-cluster depends on to
> rebalance
> > a topic.  I'm still pretty confused by the 0.9 "consumer group" stuff.
> >
> > Seriously considering downgrading to the latest 0.8 release, because
> > there's a massive gap in documentation for the new stuff in 0.9 (like
> > consumer groups) and we don't really need any of the new features.
> >
> > A common work-around is to configure the consumer to handle "offset
> >> out of range" exception by jumping to the last offset available in the
> >> log. This is the behavior of the Java client, and it would have saved
> >> your consumer here. Go client looks very low level, so I don't know
> >> how easy it is to do that.
> >
> > Erf, this seems like it would almost guarantee data loss.  :(  Will check
> > it out tho.
> >
> > If I were you, I'd retest your ASG scripts without the auto leader
> >> election - since your own scripts can / should handle that.
> >
> > Okay, this is straightforward enough.  Will try it.  And will keep
> tryingn
> > to figure out how to balance the __consumer_offsets topic, since I
> > increasingly think that's the key to this giant mess.
> >
> > If anyone has any advice there, massively appreciated.
> >
> > Thanks,
> >
> > charity.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message