kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Stein <joe.st...@stealth.ly>
Subject Re: Data replication and zero data loss
Date Fri, 01 May 2015 13:28:36 GMT
If you want 0 data loss you should also look into the min.insync.repica
setting in 0.8.2.1 as it guarantees data in multiple racks.

If you don't have that set then you have this scenario as possible.

lets say 1 topic, 1 partition, replication 3. You are producing with ACK=-1

b1, b2, b3 (where b=broker and b1 is leader, b2, b3 replicas).

b1,b2 dies, b3 is leader. so far all is well.

10 minutes go by and b3 dies

1 minute later b1 comes back online, it will truncate essentially 45
minutes of data upstream thought was saved.

but now, you can have ACK=-1 get a failure if you don't have a enough
replica to survive data loss guarantees. min.isr=2 min.sir=3 //depends on
data

Also take a look at
https://github.com/stealthly/go_kafka_client/tree/master/mirrormaker it
might be helpful for what you are looking for.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, May 1, 2015 at 7:43 AM, Joong Lee <joong@me.com> wrote:

> It is based on our understanding from reading the documents.
>
> We aren't concerned of data duplication as that is going to be handled by
> elasticsearch.
>
> > On May 1, 2015, at 12:15 AM, Daniel Compton <
> daniel.compton.lists@gmail.com> wrote:
> >
> > When we evaluated MirrorMaker last year we didn't find any risk of data
> > loss, only duplicate messages in the case of a network partition.
> >
> > Did you discover data loss in your tests, or were you just looking at the
> > docs?
> > On Fri, 1 May 2015 at 4:31 pm Jiangjie Qin <jqin@linkedin.com.invalid>
> > wrote:
> >
> >> Which mirror maker version did you look at? The MirrorMaker in trunk
> >> should not have data loss if you just use the default setting.
> >>
> >>> On 4/30/15, 7:53 PM, "Joong Lee" <joong@me.com> wrote:
> >>>
> >>> Hi,
> >>> We are exploring Kafka to keep two data centers (primary and DR)
> running
> >>> hosts of elastic search nodes in sync. One key requirement is that we
> >>> can't lose any data. We POC'd use of MirrorMaker and felt it may not
> meet
> >>> out data loss requirement.
> >>>
> >>> I would like ask the community if we should look for another solution
> or
> >>> would Kafka be the right solution considering zero data loss
> requirement.
> >>>
> >>> Thanks
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message