kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <jun...@gmail.com>
Subject Re: HA / failover
Date Tue, 30 Aug 2011 18:46:41 GMT
See my inlined reply below.

Thanks,

Jun


On Tue, Aug 30, 2011 at 8:36 AM, Roman Garcia <rgarcia@dridco.com> wrote:

> >> Roman,
> Without replication, Kafka can lose messages permanently if the
> underlying storage system is damaged. Setting that aside, there are 2
> ways that you can achieve HA now. In either case, you need to set up a
> Kafka cluster with at least 2 brokers.
>
> Thanks for the clarification Jun. But even then, with replication, you
> could still lose messages, right?
>
>
If you do synchronous replication with replication factor >1 and there is
only 1 failure, you won't lose any messages.


> >> [...] Unconsumed messages on that broker will not be available for
> consumption until the broker comes up again.
>
> How does a Consumer fetch those "old" messages, given that it did
> already fetch "new" messages at a higher offset? What am I missing?
>

There is one offset per topic/partition, if a partition is not available
because a broker is down, its offset in the consumer won't grow anymore.


>
> >> The second approach is to use the built-in ZK-based software load
> balancer in Kafka (by setting zk.connect in the producer config). In
> this case, we rely on ZK to detect broker failures.
>
> This is the approach I've tried. I did use zj.connect.
> I started all locally:
> - 2 Kafka brokers (broker id=0 & 1, single partition)
> - 3 zookeeper nodes (all of these on a single box) with different
> election ports and different fs paths/ids.
> - 5 producer threads sending <1k msgs
>
> Then I killed one of the Kafka brokers, and all my producer threads
> died.
>
>
That could be a bug. Are you using trunk? Any errors/exceptions in the log?


> What I'm I doing wrong?
>
>
> Thanks!
> Roman
>
>
> -----Original Message-----
> From: Jun Rao [mailto:junrao@gmail.com]
> Sent: Tuesday, August 30, 2011 11:44 AM
> To: kafka-users@incubator.apache.org
> Subject: Re: HA / failover
>
> Roman,
>
> Without replication, Kafka can lose messages permanently if the
> underlying storage system is damaged. Setting that aside, there are 2
> ways that you can achieve HA now. In either case, you need to set up a
> Kafka cluster with at least 2 brokers.
>
> The first approach is to put the hosts of all Kafka brokers in a VIP and
> rely on a hardware load balancer to do health check and routing. In the
> case, all producers send data through the VIP. If one of the brokers is
> down temporarily, the load balancer will direct the produce requests to
> the rest of the brokers. Unconsumed messages on that broker will not be
> available for consumption until the broker comes up again.
>
>  The second approach is to use the built-in ZK-based software load
> balancer in Kafka (by setting zk.connect in the producer config). In
> this case, we rely on ZK to detect broker failures.
>
> Thanks,
>
> Jun
>
> On Tue, Aug 30, 2011 at 7:18 AM, Roman Garcia <rgarcia@dridco.com>
> wrote:
>
> > Hi, I'm trying to figure out how my prod environment should look like,
>
> > and still I don't seem to understand how to achieve HA / FO
> conditions.
> >
> > I realize this is going to be fully supported once there is
> > replication, right?
> >
> > But what about right now? How do you guys achieve this?
> >
> > I understand at least LinkedIn has a Kafka cluster deployed.
> >
> > - How do you guys ensure no messages get lost before flush to disk
> happens?
> >
> > - How did you manage to always have a broker available and redirect
> > producers to those during failure?
> > I've tried using Producer class with "sync" type and zookeeper, and
> > killing one of two brokers, but I've got an exception. Should I handle
>
> > and retry then?
> >
> > So, to sum up, any pointer on how should I setup a prod env will be
> > appreciated! Any doc I might have missed or a simple short example
> > would help.
> > Thanks!
> > Roman
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message