kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Péter Sinóros-Szabó <peter.sinoros-sz...@transferwise.com.INVALID>
Subject Re: MM2 for DR
Date Mon, 02 Mar 2020 13:23:42 GMT
Hi Ryanne,

> I frequently demo this stuff, where I pull the plug on entire DCs and
apps keep running like nothing happened.
Is there any public recording, documentation about these demos?
I would be very useful to see how it works.

Thanks,
Peter

On Thu, 13 Feb 2020 at 00:42, Ryanne Dolan <ryannedolan@gmail.com> wrote:

> > elaborate a bit more about the active-active
>
> Active/active in this context just means that both (or multiple)
> clusters are used under normal operation, not just during an outage.
> For this to work, you basically have isolated instances of your application
> stack running in each DC, with MM2 keeping each DC in sync. If one DC is
> unavailable, traffic is shifted to another DC. It's possible to set this up
> s.t. failover/failback between DCs happens automatically and seamlessly,
> e.g. with load balancers and health checks. It's more complicated to set up
> than the active/standby approach, but DR sorta takes care of itself from
> then on. I frequently demo this stuff, where I pull the plug on entire DCs
> and apps keep running like nothing happened.
>
> On Wed, Feb 12, 2020 at 12:05 AM benitocm <benitocm@gmail.com> wrote:
>
> > Hi Ryanne,
> >
> > Please could you elaborate a bit more about the active-active
> > recommendation?
> >
> > Thanks in advance
> >
> > On Mon, Feb 10, 2020 at 10:21 PM benitocm <benitocm@gmail.com> wrote:
> >
> > > Thanks very much for the response.
> > >
> > > Please could you elaborate a bit more about  "I'd
> > > arc in that direction. Instead of migrating A->B->C->D...,
> active/active
> > is
> > > more like having one big cluster".
> > >
> > > Another thing that I would like to share is that currently my consumers
> > > only consumer from one topic so the fact of introducing MM2 will impact
> > > them.
> > > Any suggestion in this regard would be greatly appreciated
> > >
> > > Thanks in advance again!
> > >
> > >
> > > On Mon, Feb 10, 2020 at 9:40 PM Ryanne Dolan <ryannedolan@gmail.com>
> > > wrote:
> > >
> > >> Hello, sounds like you have this all figured out actually. A couple
> > notes:
> > >>
> > >> > For now, we just need to handle DR requirements, i.e., we would not
> > need
> > >> active-active
> > >>
> > >> If your infrastructure is sufficiently advanced, active/active can be
> a
> > >> lot
> > >> easier to manage than active/standby. If you are starting from scratch
> > I'd
> > >> arc in that direction. Instead of migrating A->B->C->D...,
> active/active
> > >> is
> > >> more like having one big cluster.
> > >>
> > >> > secondary.primary.topic1
> > >>
> > >> I'd recommend using regex subscriptions where possible, so that apps
> > don't
> > >> need to worry about these potentially complex topic names.
> > >>
> > >> > An additional question. If the topic is compacted, i.e.., the topic
> > >> keeps
> > >> > forever, does switchover operations would imply add an additional
> path
> > >> in
> > >> > the topic name?
> > >>
> > >> I think that's right. You could always clean things up manually, but
> > >> migrating between clusters a bunch of times would leave a trail of
> > >> replication hops.
> > >>
> > >> Also, you might look into implementing a custom ReplicationPolicy. For
> > >> example, you could squash "secondary.primary.topic1" into something
> > >> shorter
> > >> if you like.
> > >>
> > >> Ryanne
> > >>
> > >> On Mon, Feb 10, 2020 at 1:24 PM benitocm <benitocm@gmail.com> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > After having a look to the talk
> > >> >
> > >> >
> > >>
> >
> https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0
> > >> > and the
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382
> > >> > I am trying to understand how I would use it
> > >> > in the setup that I have. For now, we just need to handle DR
> > >> requirements,
> > >> > i.e., we would not need active-active
> > >> >
> > >> > My requirements, more or less, are the following:
> > >> >
> > >> > 1) Currently, we have just one Kafka cluster "primary" where all the
> > >> > producers are producing to and where all the consumers are consuming
> > >> from.
> > >> > 2) In case "primary" crashes, we would need to have other Kafka
> > cluster
> > >> > "secondary" where we will move all the producer and consumers and
> keep
> > >> > working.
> > >> > 3) Once "primary" is recovered, we would need to move to it again
> (as
> > we
> > >> > were in #1)
> > >> >
> > >> > To fullfill #2, I have thought to have a new Kafka cluster
> "secondary"
> > >> and
> > >> > setup a replication procedure using MM2. However, it is not clear
to
> > me
> > >> how
> > >> > to proceed.
> > >> >
> > >> > I would describe the high level details so you guys can point my
> > >> > misconceptions:
> > >> >
> > >> > A) Initial situation. As in the example of the KIP-382, in the
> primary
> > >> > cluster, we will have a local topic: "topic1" where the producers
> will
> > >> > produce to and the consumers will consume from. MM2 will create in
> > the
> > >> > primary the remote topic "primary.topic1" where the local topic in
> the
> > >> > primary will be replicated. In addition, the consumer group
> > information
> > >> of
> > >> > primary will be also replicated.
> > >> >
> > >> > B) Kafka primary cluster is not available. Producers are moved to
> > >> produce
> > >> > into the topic1 that it was manually created. In addition, consumers
> > >> need
> > >> > to connect to
> > >> > secondary to consume the local topic "topic1" where the producers
> are
> > >> now
> > >> > producing and from the remote topic  "primary.topic1" where the
> > >> producers
> > >> > were producing before, i.e., consumers will need to aggregate.This
> is
> > so
> > >> > because some consumers could have lag so they will need to consume
> > from
> > >> > both. In this situation, local topic "topic1" in the secondary will
> be
> > >> > modified with new messages and will be consumed (its consumption
> > >> > information will also change) but the remote topic "primary.topic1"
> > will
> > >> > not receive new messages but it will be consumed  (its consumption
> > >> > information will change)
> > >> >
> > >> > At this point, my conclusion is that consumers needs to consume from
> > >> both
> > >> > topics (the new messages produced in the local topic and the old
> > >> messages
> > >> > for consumers that had a lag)
> > >> >
> > >> > C) primary cluster is recovered (here is when the things get
> > complicated
> > >> > for me). In the talk, the new primary is renamed a primary-2 and the
> > >> MM2 is
> > >> > configured to active-active replication.
> > >> > The result is the following. The secondary cluster will end up with
> a
> > >> new
> > >> > remote topic (primary-2.topic1) that will contain a replica of the
> new
> > >> > topic1 created in the primary-2 cluster. The primary-2 cluster will
> > >> have 3
> > >> > topics. "topic1" will be a new topic where in the near future
> > producers
> > >> > will produce, "secondary.topic1" contains the replica of the local
> > topic
> > >> > "topic1" in the secondary and "secondary.primary.topic1" that is
> > >> "topic1"
> > >> > of the old primary (got through the secondary).
> > >> >
> > >> > D) Once all the replicas are in sync, producers and consumers will
> be
> > >> moved
> > >> > to the primary-2. Producers will produce to local topic "topic1" of
> > >> > primary-2 cluster. The consumers
> > >> > will connect to primary-2 to consume from "topic1" (new messages
> that
> > >> come
> > >> > in), "secondary.topic1" (messages produced during the outage) and
> from
> > >> > "secondary.primary.topic1" (old messages)
> > >> >
> > >> > If topics have a retention time, e.g. 7 days, we could remove
> > >> > "secondary.primary.topic1" after a few days, leaving the situation
> as
> > at
> > >> > the beginning. However, if another problem happens in the middle,
> the
> > >> > number of topics could be a little difficult to handle.
> > >> >
> > >> > An additional question. If the topic is compacted, i.e.., the topic
> > >> keeps
> > >> > forever, does switchover operations would imply add an additional
> path
> > >> in
> > >> > the topic name?
> > >> >
> > >> > I would appreciate some guidance with this.
> > >> >
> > >> > Regards
> > >> >
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message