kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ewen Cheslack-Postava <e...@confluent.io>
Subject Re: Query on MirrorMaker Replication - Bi-directional/Failover replication
Date Fri, 06 Jan 2017 03:31:45 GMT
On Thu, Jan 5, 2017 at 3:07 AM, Greenhorn Techie <greenhorntechie@gmail.com>
wrote:

> Hi,
>
> We are planning to setup MirrorMaker based Kafka replication for DR
> purposes. The base requirement is to have a DR replication from primary
> (site1) to DR site  (site2)using MirrorMaker,
>
> However, we need the solution to work in case of failover as well i.e.
> where in the event of the site1 kafka cluster failing, site2 kafka cluster
> would be made primary. Later when site1 cluster eventually comes back-up
> online, direction of replication would be from site2 to site1.
>
> But as I understand, the offsets on each of the clusters are different, so
> wondering how to design the solution given this constraint and
> requirements.
>

It turns out this is tricky. And once you start digging in you'll find it's
way more complicated than you might originally think.

Before going down the rabbit hole, I'd suggest taking a look at this great
talk by Jun Rao (one of the original authors of Kafka) about multi-DC Kafka
setups: https://www.youtube.com/watch?v=Dvk0cwqGgws

Additionally, I want to mention that while it is tempting to want to treat
multi-DC DR cases in a way that we get really convenient, strongly
consistent, highly available behavior because that makes it easier to
reason about and avoids pushing much of the burden down to applications,
that's not realistic or practical. And honestly, it's rarely even
necessary. DR cases really are DR. Usually it is possible to make some
tradeoffs you might not make under normal circumstances (the most important
one being the tradeoff between possibly seeing duplicates vs exactly once).
The tension here is often that one team is responsible for maintain the
infrastructure and handling this DR failover scenario, and others are
responsible for the behavior of the applications. The infrastructure team
is responsible for figuring out the DR failover story but if they don't
solve it at the infrastructure layer then they get stuck having to
understand all the current (and future) applications built on that
infrastructure.

That said, here are the details I think you're looking for:

The short answer right now is that doing DR failover like that is not going
to be easy with MM. Confluent is building additional tools to deal with
multi-DC setups because of a bunch of these challenges:
https://www.confluent.io/product/multi-datacenter/

For your specific concern about reversing the direction of replication,
you'd need to build additional tooling to support this. The basic list of
steps would be something like this (assuming non-compacted topics):

1. Use MM normally to replicate your data. Be *very* sure you construct
your setup to ensure *everything* is mirrored (proper # of partitions,
replication factor, topic level configs, etc). (Note that this is something
the Confluent replication solution is addressing that's a significant gap
in MM.)
2. During replication, be sure to record offset deltas for every topic
partition. These are needed to reverse the direction of replication
correctly. Make sure to store them in the backup DC and somewhere very
reliable.
3. Observe DC failure.
4. Decide to do failover. Ensure replication has actually stopped (via your
own tooling, or probably better, by using ACLs to ensure no new data can be
produced from original DC to backup DC)
5. Record all the high watermarks for every topic partition so you know
which data was replicated from the original DC (vs which is new after
failover).
6. Allow failover to proceed. Make the backup DC primary.
7. Once the original DC is back alive, you want to reverse replication and
make it the backup. Lookup the offset deltas, use them to initialize
offsets for the consumer group you'll use to do replication.
8. Go back to the original DC and make sure there isn't any "extra" data,
i.e. stuff that didn't get replicated but was successfully written to the
original DC's cluster. For topic partitions where there is data beyond the
expected offsets, you currently would need to just delete the entire set of
data, or at least to before the offset we expect to start at. (A truncate
operation might be a nice way to avoid having to dump *all* the data, but
doesn't currently exist.)
9. Once you've got the two clusters back in a reasonably synced state with
appropriate starting offsets committed, start up MM again in the reverse
direction.

If this sounds tricky, it turns out that when you add compacted topics,
things get quite a bit messier....

-Ewen


>
> Thanks
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message