kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Kleppmann <mkleppm...@linkedin.com>
Subject Re: fidelity of offsets when mirroring
Date Thu, 06 Mar 2014 12:28:47 GMT
If you really don't mind some messages being lost during failover, your simplest option would
be to just restart consumers at the latest offset in the new AZ. Or, if you don't mind messages
being duplicated, rewind to an earlier time t as explained by Jun and Neha.

Another thought: you might be able to provide stronger guarantees at an application level.
For example, you could include a unique identifier within every message, and use that to detect
and discard duplicate messages after failover. However, keeping track of all those message
IDs might require too much state (and that state would also have to be replicated across AZs).
If you're doing offline processing of the data, eg importing it into Hadoop, then de-duplicating
by message ID might be feasible. Just an idea.


On 5 Mar 2014, at 17:30, Neha Narkhede <neha.narkhede@gmail.com> wrote:
> Jun's suggested design is the closest you can get to achieving a AZ failure
> with mirroring. However, one thing I'd like to point out about the
> getOffsetsBefore API is the fact that it gives you the approximate offset
> for a particular time t. For example, if you ask for an offset of a message
> produced at time t, it may give you the offset for a message that was
> produced at time (t - t`). The only guarantee it provides is that the
> offset returned will be for a message that was produced *before* time t.
> What this means for mirroring is that during a failover you can get
> duplicate messages.
> Thanks,
> Neha
> On Tue, Mar 4, 2014 at 8:42 PM, Jun Rao <junrao@gmail.com> wrote:
>> Currently, message offsets are not preserved by mirror maker.
>> You can potentially do the failover based on the failover time. Suppose
>> that the consumption in A failed at time t. You find the offset before time
>> t using our getOffsetBefore api to get the starting offset in B. Then, you
>> have to manually import these offsets into ZK and then start the consumer.
>> Thanks,
>> Jun
>> On Tue, Mar 4, 2014 at 3:23 PM, Seth White <seth.white@salesforce.com
>>> wrote:
>>> Hi,
>>> I have a question about mirroring.   I would like to create a highly
>>> available Kafka service that runs on AWS and can survive an AZ failure.
>>> Based on what I've read, I plan to create a Kafka cluster in each AZ and
>>> use mirror maker to replicate one cluster to the other.   I'll call the
>> two
>>> clusters in their respective availability zones A and B. A is the primary
>>> which is replicated to B.  Normally, all consumers consume from A and
>>> record their current offset in a persistent store that is replicated
>> across
>>> A and B (like Dynamo).   If I detect that A  has failed producers and
>>> consumers will fail over to B.   That's the basic idea.
>>> Now, the question:   Can I rely on the offset that is being stored in the
>>> persistent store to refer to the same event in each cluster?   Or is it
>>> possible for the two to get out of sync over time - I don't know why,
>>> failures of some kind maybe - in which case the offset from A  might not
>>> really be valid with respect to the replica B.   If that is possible,
>> then
>>> I'm wondering what I can/should do about it  to achieve a clean failover.
>>> I realize that the replication may lag behind, so some events from A
>> make
>>> be lost when there is a failover. That is okay.
>>> I've been told that creating a single cluster that spans AZs  and relying
>>> on the new replication functionality in 0.8 is a bad idea, as zookeeper
>>> isn't well behaved in that case.   Hence my alternative design.
>>> Thanks in advance.
>>> Seth

View raw message